Friday, September 07, 2007

The "social graph" and search engines

Robert Scoble recently posted about Mahalo, TechMeme and Facebook versus Google. His thesis is basically that somehow blending social networks with search engines will be the next big thing. He also comments (as have others) that searching blogs can get better results than major search engines sometimes.

Danny Sullivan chimed in response with a blistering commentary on both Scoble's "new ideas" and Mahalo (run by Jason Calacanis). Mahalo and ChaCha are both 'human powered' search engines. Basically they take popular search terms and use editor to augment and/or reorganize Google results.

First a history review. Way back Yahoo built it's people powered directory, while initially useful it could not keep up with the growth of the internet. Google comes along with a simple idea called PageRank (it essentially forms a Markov model of the web and computes the stationary distribution of the markov matrix - an 80+ year old idea applied to the web) and kills Yahoo's directory as well as purely keyword based engines like Altavista.

More History. Once upon a time in the 60s-80s expert systems were seen as the next big thing in AI. Solve all the world's problems by enabling a formal system of rules and facts to answer questions posed to the system. ES was a miserable failure at these lofty goals. Why? Growing the rulebase is hard. Humans do a terrible job at crafting rulesets that are complete and consistent (no conflicts). Even worse is when you throw multiple people at crafting rules together. You end up with trash.

Why is this relevant here? The lesson of ES seems to be lost on efforts like ChaCha and Mahalo. These systems are built on very basic rules (if query X then return A, B, C, D ...). Granted these are much simpler rules than a typical ES, and the engines don't support real reasoning using backward or forward chaining either. This may not save them.. the rules will still suffer from the huge maintenance problem in a context where the information captured is dynamic and changing. Just ask any of the dozen 80s companies that tried to build medical diagnosis expert systems. The rules suffered from inattention to medical advances as well as being contradictory (multiple doctors with different ideas making rules).

Nowdays we call this "linkrot" on the web. While successful, sites like suffered from linkrot on pages not frequently edited. How will ChaCha and Mahalo avoid this without having a massive number of editors? itself suffers from the same issues, people tag stuff and it mostly rots unorganized or maintained.

Yet More History. From about 1999 to 2003 sold software in the emerging web eCRM space in addition to having a search engine. Web eCRM (or web self-service) is essentially creating a customer service portal for corporate websites. The portal contains a collection of FAQs, articles, HowTos, Manuals etc. The essential function of the portal is to help people find what they are looking for and keep them from dialing the 1800 customer service number (which typically costs a company about $30 per call). AskJeeves sold their CRM and enterprise search unit in 2003 for less than 5 million dollars. Why? Their system required manual input of of a huge set of rules linking search queries and documents, as well as complex rules to equate queries to other queries and attempt to do some Natural Language Processing and Inference.

It didn't work, there was no way in hell that an average business user that maintained this set of Articles, FAQs etc was prepared to the massive amount of structuring. AskJeeves attempted to hire a team of people to optimize and tune the implementations. It took weeks of learning the business and translating that into structure for the engine to use. Nowdays we call this SEO.

Another example in CRM is the 'chatbot'. These are software products that try and give a user a good customer experience by putting a cute face/persona on the search box and having it talk back to you in a conversational style. They have never really taken off, despite the CRM industry analysts that love them. They suffer from the same basic problem that expert systems (chat bots are expert systems of a sort) suffered from.. structuring information is hard for most people to do.

For the past 8 years I've been working for an CRM company (RightNow Tech) that had a simple idea to help customer service web portals... implicitly learn from what users are doing in the portal to optimize the engine automatically. (See patents 6434550, 6665655, & 6842748 - at the moment the RNT systems process about 100 Million searches per month). The cutting edge of eservice CRM at the moment is taking that type of idea and THEN adding (or learning) structure to it.

Lessons learned and observations:

Study the basic history of AI. Here's a good book Artificial Intelligence: A Modern Approach.
Note that the one of the authors (Peter Norvig) is The Director of Research at Google. Prabhakar Raghavan is his counterpart at Yahoo. and Microsoft also have strong AI people. There is no secret as to why these four companies are hiring all the good AI people they can relocate to the bay area, Seattle and New Jersey. You will not beat them with an expert system. A secondary lesson of AI is to never believe someone who will attempt to tell you that a new algorithm will create intelligence (neural networks anyone? Fuzzy Logic?).

Look at industries like CRM as a microcosm of the search industry. For every new idea you have, someone in CRM has likely tried it already on a smaller scale.

Beware of old wine in new bottles. You might be able to spend enough money on PR to help you get attention.. but you will likely die unless you invest in real scalable algorithms to do the work.

I'm certainly not intending to down-grade ChaCha and Mahalo as viable businesses. Often the viability of a business is independent of the technology used. They seem to have plenty of funding, and will likely adapt as they see problems. A babe-in-the-woods can't get 20 million in VC money. Neither of these systems will require boiling-the-ocean and implementing strong AI. Spinning a tight loop on what users are looking for and optimizing those results as fast as possible might work long enough to make some cash... it worked to bootstrap Yahoo after all.

As for the social-network blending into standard search? Stay tuned, I'll post some thoughts on that soon. There are plenty of good AI people working on graph based data mining.

Circling back to expert systems, if you can automatically 'read' text, and induce a rule-base.. then use that to help with queries, then we have something. I believe the direction of search engines will slowly head in this direction... machine reading.

Jordan Mitchell (my new boss at recently posted on the same subject on his blog.

Other interesting links about this:
Skrentablog on Mahalo
Keving Burton's Thoughts on the Social Graph

No comments: