Wednesday, October 17, 2007

Learning to Rank

SIGIR 2007 (which I unfortunately did not attend) had a really great workshop called 'Learning to Rank' or LTR. The weekly RightNow-organized Bozeman AI Colloquium recently covered two papers in this area. Essentially the idea is that a search engine can implicitly learn to rank documents for a given query by looking at user behavior.

The first one we covered (by Yeh, Lin, Ke & Yang) used genetic programming to do the learning. Needless to say this caught my eye. Evolutionary Algorithms are built to learn rankings, usually based upon a fitness function. I found this paper interesting, however even the authors admit that their algorithm is very slow.

In my mind they picked too complex of an algorithm. There are far simpler EAs that can do this job. The well-known (n+1) EA could do this task (per query). I'll likely be writing a paper on this for GECCO 2008. l

Many of the workshop papers reference work by Joachims and Radlinski (find them here). Their recent paper in IEEE Computer (not avail for free) was interesting in that they used a LTR method to re-rank Google results and then did a user-study to look at how effective the method was.

Personally I think that the idea of LTR should be a component of every search engine. The ranking of search results should change as fast as users interact with the content, rather than how fast the content itself changes. This is something that the big search engines are fairly quiet on, not sure why.

Sure it's an incremental rather than revolutionary step (Powerset is trying to take a revolutionary step), however can anyone give me a good argument why LTR should not be done? The idea can be applied to any engine.. keyword, link-graph (Google) or NLP based (Powerset).

Taking the next step beyond that, the next big thing could very well be doing an LTR method per-person or per-peer-group for each query family. This effectively would allow the engine to self-learn to personalize results. One can imagine how this could be glued into the idea of using the 'social graph' to establish the peer-group on a given topic/query.

No comments: