aicoder: collective search

Monday, October 29, 2007

Inferring meaning from data and structure

Jeremy Liew has quite the thread going on his blog about 'Meaning = Data + Structure (User Generated)', Part2 on Inferring Structure and a Guest Post by Peter Moore.

The post by Moore is a wonderful summary of approaches and their difficulties, and I'll post more on this as I think about it. My initial response is that we should stop looking/waiting for some near holy-grail {fully functional semantic web} and use a lot of good-enough {technologies, algorithms, ontologies} to make progress. I think that the perfection-in-reasoning stuff is great for the teleportation version of personal search vs the good-enough techniques as applicable now to the orienteering version of personal search. See this post and this paper for orienteering vs teleportation in search.

Last week the Bozeman AI group read a paper on Deriving a large Scale Taxonomy from Wikipedia. I look at this as an example of the main idea above, deriving structure from user generated content. True, Wikipedia is already structured, but not necessarily in a way that a computer program can use to reason with.

The killer thing about this idea is that it's finally time to do it. Essentially this is what machine learning and data mining has been about for years. I've read/perused hundreds of academic papers where the basic premise is that we write a suite of algorithms to learn/extract structure from a pool of data. A big chunk of papers in the KDD conferences each year (2007, 2006, 2005) operates on this premise and this field is quite old (decades).

Really pointy-headed CS types are horrible at monetizing their work. At approx the same time that Google founders were inventing PageRank, Jon Kleinberg was creating HITS. Both are link-analysis algorithms to augment what at the time were poor quality search engines. Over the past 10 years when they are evaluated head-to-head on some Information Retrieval task HITS works on-par with PageRank. Yet Kleinberg is not now worth 40 billion dollars like Brin and Page of Google.

I fear that the Semantic web people/researchers have been building sand castles for a decade rather than monetizing what they have to subsidize more research on it. Perhaps if they had been Delicious, Digg, WikiPedia, et al. would be contributing to the Semantic Web natively, rather than forcing people to figure out a way to export that data into RDF/OWL.

Monday, September 10, 2007

The Implicit Web flowing into Collective Search

Here are some recent articles that I read and kept thinking about again and again. What is cool about this moment in time is that these things are gelling. Entrepreneurs and innovators are trying to build this stuff, rather than the ideas rotting unfulfilled in the mind of some AI/Search-Engine geek.

Read/Write Web's Implicit Web

Important point here is that systems should both learn what users are interested in implicitly and allow users control over the learned topics. The former point is what algorithms like collaborative filtering were intended to do. The latter is a great point that users should have visibility and control into their learned topics.

This has been a frequent critique against Amazon's recommender system.. while personalized, it can learn goofy things. I have no desire to be a frequent buyer of items similar to what I bought for a niece as a gift last year.

Collective Search by Greg Linden

I just learned that Greg is one of the brains behind Amazon's AI. Thinking about the data Amazon has and what could be done with it always makes me drool. Greg's post here is an aggregation of points he came up with while reading transcripts of the recent SES 2007 conference.

I'll join Ask's Jim Lanzone (isn't the new Ask.com much better than Google!) in saying that collective search is potentially better than personalized search. Greg is arguing for a redefinition of 'personalization' here, but we have to pick descriptive terms for abstract ideas. I would define personalization as skewing of search results by what you are interested in. Where I'd read collective search as letting the collective behaviors of a group of similar users influence/skew search results. This is the flavor of stuff I worked on at RightNow.

Ultimate Answer Engine @ Information Week

Favorite quote: "Who said an edit box and 10 blue links is what search is?" asks Microsoft's Satya Nadella.

This great piece has several items that just jumped out at me. "Queryless Search", essentially this is using what the system knows about you and your path through to the engine and do a implicit query. (We also worked and patented variations of this idea at RightNow). The "Personalization" and "Social Skills" sections deal with the ideas in Greg's post above. More to come on that re 'The Social Graph'.

Another good quote: "Serendipity is an amazing teacher". This is what Others Online is all about... focused on People, not necessarily documents/media.

After reading all three of these in the current context of what people are willing to spend time and money on... I can't help but be totally jacked about the opportunities at hand!

Loads of academics have been working on this stuff for years, check out any ACM SIGIR and various data mining conference proceedings for the last 10+ years. Personally, I've been thinking and working on many of the things above since 2000 when Doug Warner and I started doing a deep dive into the academic literature.