Cuil shows us how not to launch a search engine
Google challenger Cuil launched last night in blaze of glory. And it went down in a ball of flames. Immediately after launch, the criticism started to pile on: results were incomplete, weird, and missing.The various articles on Cuil's failure revealed much about their architecture. Apparently they are categorizing a user query into a topic and shipping that out to topical servers. While this sort of 'topical partitioning' is interesting, it has zilch to do with relevance ranking... and suffers from a failure-point issue.. if that topic server goes down then queries against that topic will get junk results or zero results
Questions and points of discussion:
- Is it really true that a data schema partition results in a better engine than Google? No, a better engine is made by better relevance. Perhaps this is what the PR/Marketing people focused on rather than relevance.
- How do you simulate load post launch load when you have no idea how widely the free press will be distributed?
- Free post launch press is invaluable to your buisiness.. squandering it might be a deathblow.
- Why not launch more quietly in early adopter tech-press and then go try and get mainstream press when you have proven the system?
- Trading on your status as ex-Googlers (and not early ones at that) seems VERY dubious. Stand on your own feet rather than someone else's.
- The absolute hottest area of information retrieval research right now is using user click-streams to improve the relevance live and on-line (learning to rank), as well as personalize results. These are differentiating features (if they result in improved relevance).
- Cuil keeps ZERO user history or assigns session/user-ids. This will make it very difficult to follow this trend.. unless they are using someone else's cookies to do the identification via analytics partner (no evidence of this).
- The other hot area of IR research is using semantic analysis and NLP to break away from simple keyword based inverted indicies. Hakia still seems to be doing it better than Cuil... or at least will appear to as long as the topical partitions keep crashing under the load.
- Risk analysis is a fantastic tool in organizing and prioritizing your work on new products.. it seems they missed that part before deciding to launch.
I still think they are wrong to go out as a consumer engine... Enterprise is a better play.. however if their leading market differentiator is a topical partitioning of back end servers.. then they aren't even considering this as individual Enterprise customers may not be big enough to need hundreds of servers to distribute the index like that.
Hindsight is always 20/20 and hope to hell I am not standing there redfaced as software I helped create fails upon high volume launch.