Friday, March 28, 2008

Semantic Features for Contextual Advertising

Andrei Broder's group at Yahoo! Research has a focus on Computational Advertising. At SIGIR 2007 they released a paper on using Semantic Taxonomy to do contextual matching for advertising. This is a similar problem to the previous post about MS Research, deriving lists of keywords from a document to use as queries to an advertising system. Unlike the MS Research paper, Yahoo has built a large taxonomy of "commercial interest queries" with 6000 nodes and approx 100 items attached to each node.

The essential approach is to classify a document into the taxonomy as well as all of the ads and match ads to documents on the basis of topical distance. The distance score is combined with a more standard IR type approach forming a combined score. The top-k matching ads ordered by lowest distance are the ads displayed the page.

The TaxScore() function is fairly interesting, it attempts to generalize the given term within the taxonomy. It seems that this type of approach could work well with using WordNet's Hypernyms in a more regular IR/Search setting.

I have to read it again more carefully to see if I missed it, however I did not see anywhere in the formulas using any weighting of a keyword's bid value (or advert count). Maybe this was omitted for trade secrecy?? .. it seems obvious that it should be used to some degree to maximize $$ yield or eCPM of clicked ads. The idea is not to let it affect the matching of the ads to keywords, just the final rank order to some degree.

In my own experiments @ OO, using some proxy for bid value seems to increase eCPM. The biggest challenge is getting comprehensive data for your dictionary if you are not Google, Yahoo or MS.

Postscript:
I have it confirmed from two independent sources (current and ex Y!ers) that Yahoo is working in a new Content Match codebase as the old version didn't work. Hard to say what status Broder's above technique is in (production usage or internal testing).. or if it was part of the old system?

No comments: