Friday, September 21, 2007

More old wine in new Web 2.0 bottles

In light of last week's post on "beware of old AI wine in new Web 2.0 bottles" I wanted to post this link from Joel Spolsky. (A buddy of mine brought my attention to it)

Once in a while Joel posts "strategy letters". This one addresses history repeating itself in the old-html-web -> ajax-web paralleling the text-terminal -> windows-api flow. Very true. I did find it odd that he did not mention the Google Toolkit in his thoughts about the potential game-changing "NewSDK" and how it needs fancy new compilers. As a CS geek I think the idea of compiling Java to cross-browser-compliant javascript a simply amazing technical achievement.

The interesting thing about this topic is that it's what Java was supposed to do for the browser back in the 90s. Didn't work, no one could keep their browser & the JVMs synced and integrated well, plus Microsoft managed to run good interference via IE just being crappy at Java/JVMs at that time.

Turns out that Java succeeded wildly in reinventing the way back-end web services are written (CGIs just don't cut it for some things despite the PHP/Perl/Python crowd making CGIs way more useful than before.) On the browser today's Java-JVM is the javascript-engine (which is not java at all). The idea of using JS as byte-code makes me cringe, but it's where we are.

Nice post Joel! Would be nice if he'd follow up on why he thinks that the Google Toolkit, Yahoo's YUI and others aren't yet (or won't get to) the definition of his NewSDK.

Monday, September 10, 2007

The Implicit Web flowing into Collective Search

Here are some recent articles that I read and kept thinking about again and again. What is cool about this moment in time is that these things are gelling. Entrepreneurs and innovators are trying to build this stuff, rather than the ideas rotting unfulfilled in the mind of some AI/Search-Engine geek.

Read/Write Web's Implicit Web

Important point here is that systems should both learn what users are interested in implicitly and allow users control over the learned topics. The former point is what algorithms like collaborative filtering were intended to do. The latter is a great point that users should have visibility and control into their learned topics.

This has been a frequent critique against Amazon's recommender system.. while personalized, it can learn goofy things. I have no desire to be a frequent buyer of items similar to what I bought for a niece as a gift last year.

Collective Search by Greg Linden

I just learned that Greg is one of the brains behind Amazon's AI. Thinking about the data Amazon has and what could be done with it always makes me drool. Greg's post here is an aggregation of points he came up with while reading transcripts of the recent SES 2007 conference.

I'll join Ask's Jim Lanzone (isn't the new Ask.com much better than Google!) in saying that collective search is potentially better than personalized search. Greg is arguing for a redefinition of 'personalization' here, but we have to pick descriptive terms for abstract ideas. I would define personalization as skewing of search results by what you are interested in. Where I'd read collective search as letting the collective behaviors of a group of similar users influence/skew search results. This is the flavor of stuff I worked on at RightNow.

Ultimate Answer Engine @ Information Week

Favorite quote: "Who said an edit box and 10 blue links is what search is?" asks Microsoft's Satya Nadella.

This great piece has several items that just jumped out at me. "Queryless Search", essentially this is using what the system knows about you and your path through to the engine and do a implicit query. (We also worked and patented variations of this idea at RightNow). The "Personalization" and "Social Skills" sections deal with the ideas in Greg's post above. More to come on that re 'The Social Graph'.

Another good quote: "Serendipity is an amazing teacher". This is what Others Online is all about... focused on People, not necessarily documents/media.

After reading all three of these in the current context of what people are willing to spend time and money on... I can't help but be totally jacked about the opportunities at hand!

Loads of academics have been working on this stuff for years, check out any ACM SIGIR and various data mining conference proceedings for the last 10+ years. Personally, I've been thinking and working on many of the things above since 2000 when Doug Warner and I started doing a deep dive into the academic literature.

Friday, September 07, 2007

The "social graph" and search engines

Robert Scoble recently posted about Mahalo, TechMeme and Facebook versus Google. His thesis is basically that somehow blending social networks with search engines will be the next big thing. He also comments (as have others) that searching blogs can get better results than major search engines sometimes.

Danny Sullivan chimed in response with a blistering commentary on both Scoble's "new ideas" and Mahalo (run by Jason Calacanis). Mahalo and ChaCha are both 'human powered' search engines. Basically they take popular search terms and use editor to augment and/or reorganize Google results.

First a history review. Way back Yahoo built it's people powered directory, while initially useful it could not keep up with the growth of the internet. Google comes along with a simple idea called PageRank (it essentially forms a Markov model of the web and computes the stationary distribution of the markov matrix - an 80+ year old idea applied to the web) and kills Yahoo's directory as well as purely keyword based engines like Altavista.

More History. Once upon a time in the 60s-80s expert systems were seen as the next big thing in AI. Solve all the world's problems by enabling a formal system of rules and facts to answer questions posed to the system. ES was a miserable failure at these lofty goals. Why? Growing the rulebase is hard. Humans do a terrible job at crafting rulesets that are complete and consistent (no conflicts). Even worse is when you throw multiple people at crafting rules together. You end up with trash.

Why is this relevant here? The lesson of ES seems to be lost on efforts like ChaCha and Mahalo. These systems are built on very basic rules (if query X then return A, B, C, D ...). Granted these are much simpler rules than a typical ES, and the engines don't support real reasoning using backward or forward chaining either. This may not save them.. the rules will still suffer from the huge maintenance problem in a context where the information captured is dynamic and changing. Just ask any of the dozen 80s companies that tried to build medical diagnosis expert systems. The rules suffered from inattention to medical advances as well as being contradictory (multiple doctors with different ideas making rules).

Nowdays we call this "linkrot" on the web. While successful, sites like About.com suffered from linkrot on pages not frequently edited. How will ChaCha and Mahalo avoid this without having a massive number of editors? Del.icio.us itself suffers from the same issues, people tag stuff and it mostly rots unorganized or maintained.

Yet More History. From about 1999 to 2003 AskJeeves.com sold software in the emerging web eCRM space in addition to having a search engine. Web eCRM (or web self-service) is essentially creating a customer service portal for corporate websites. The portal contains a collection of FAQs, articles, HowTos, Manuals etc. The essential function of the portal is to help people find what they are looking for and keep them from dialing the 1800 customer service number (which typically costs a company about $30 per call). AskJeeves sold their CRM and enterprise search unit in 2003 for less than 5 million dollars. Why? Their system required manual input of of a huge set of rules linking search queries and documents, as well as complex rules to equate queries to other queries and attempt to do some Natural Language Processing and Inference.

It didn't work, there was no way in hell that an average business user that maintained this set of Articles, FAQs etc was prepared to the massive amount of structuring. AskJeeves attempted to hire a team of people to optimize and tune the implementations. It took weeks of learning the business and translating that into structure for the engine to use. Nowdays we call this SEO.

Another example in CRM is the 'chatbot'. These are software products that try and give a user a good customer experience by putting a cute face/persona on the search box and having it talk back to you in a conversational style. They have never really taken off, despite the CRM industry analysts that love them. They suffer from the same basic problem that expert systems (chat bots are expert systems of a sort) suffered from.. structuring information is hard for most people to do.

For the past 8 years I've been working for an CRM company (RightNow Tech) that had a simple idea to help customer service web portals... implicitly learn from what users are doing in the portal to optimize the engine automatically. (See patents 6434550, 6665655, & 6842748 - at the moment the RNT systems process about 100 Million searches per month). The cutting edge of eservice CRM at the moment is taking that type of idea and THEN adding (or learning) structure to it.

Lessons learned and observations:

Study the basic history of AI. Here's a good book Artificial Intelligence: A Modern Approach.
Note that the one of the authors (Peter Norvig) is The Director of Research at Google. Prabhakar Raghavan is his counterpart at Yahoo. Ask.com and Microsoft also have strong AI people. There is no secret as to why these four companies are hiring all the good AI people they can relocate to the bay area, Seattle and New Jersey. You will not beat them with an expert system. A secondary lesson of AI is to never believe someone who will attempt to tell you that a new algorithm will create intelligence (neural networks anyone? Fuzzy Logic?).

Look at industries like CRM as a microcosm of the search industry. For every new idea you have, someone in CRM has likely tried it already on a smaller scale.

Beware of old wine in new bottles. You might be able to spend enough money on PR to help you get attention.. but you will likely die unless you invest in real scalable algorithms to do the work.

I'm certainly not intending to down-grade ChaCha and Mahalo as viable businesses. Often the viability of a business is independent of the technology used. They seem to have plenty of funding, and will likely adapt as they see problems. A babe-in-the-woods can't get 20 million in VC money. Neither of these systems will require boiling-the-ocean and implementing strong AI. Spinning a tight loop on what users are looking for and optimizing those results as fast as possible might work long enough to make some cash... it worked to bootstrap Yahoo after all.

As for the social-network blending into standard search? Stay tuned, I'll post some thoughts on that soon. There are plenty of good AI people working on graph based data mining.

Circling back to expert systems, if you can automatically 'read' text, and induce a rule-base.. then use that to help with queries, then we have something. I believe the direction of search engines will slowly head in this direction... machine reading.

Jordan Mitchell (my new boss at OthersOnline.com) recently posted on the same subject on his blog.

Other interesting links about this:
Skrentablog on Mahalo
Keving Burton's Thoughts on the Social Graph

Thursday, September 06, 2007

New Job - OthersOnline.com

I just started a new job at OthersOnline.com. It's a new startup with a social networking spin. We let users declare themselves, their pages and interests, then be syndicated around the web via the OO Widget (see it to the right). We also have a browser toolbar that allows users to see other people relevant to the user's own interests and the content of the current webpage. I think my official title is the "Search Guy" or "AI Guy" or something. The potential of these two basic ideas is huge, and I'm wading in chest deep to put some great AI ideas into the systems. More posts coming soon on these topics.

I spent the last (nearly) eight years working at RightNow Technologies (a CRM SAAS company - once upon a time it was a small startup as well) in the AI Research Labs. At RNT I was in charge of implementing various search engines, data mining & nlp algorithms, swarm techniques, user interfaces, analytics, and whatever AI I could throw at the basic problem of enabling endusers to find information on approx 2000+ customer service portals around the web (here is Leapfrog's Portal). I spent most of the last six months becoming the project manager of the group, responsible for multiple projects, coordinating with product management, initiating new feature ideas, etc. It's a fantastic group to work for, and has an application for about any advanced CS topic there is. A more complete synopsis is on my resume.

(At some point in 2008 I will hopefully finish a PhD in CS at Montana State - topic is Theory of Genetic Algorithms)

New Blog

I have been ignoring using a blog for too long, the old homepage is too static. I'll use this space to muse about artificial intelligence, search engines, machine learning, social media & widgets, my career, PhD dissertation progress, Montana, fishing and good beer.

My Montana State University homepage

RSS Feed of this Blog