- 5% of page text carefully chosen can yield 97+% of full-text advert matching relevance
- Best parts of the document are URL, referring URL, title, Meta and headings.
- Adding in a classification against a topical taxonomy adds to the accuracy of the ad matches.
- They judged the ad matching relevance against human judgments of ad to page relevance.
Of course internet search engines used this for years (it defines the SEO industry's eco-system), and the old/retired open source engine HtDig has had special treatment of those fields since the late 90s. The difference now is the direction, the documents are the "query" and the hits are the ads. Best part about the method is that it's cheap... javascript + the browser becomes your distributed spider and summarizer of the web.
I do love finding these papers.. we just don't have the time or resources to have a study like this and confirm the approach with a paid human factors study. Just go forward on gut educated feel day to day and the human measure is if we get clicks on the ads.
This approach is similar to the one we outlined and implemented before finding this paper. The difference is what we do with the resulting "query", using the signal to learn a predictive interest model of users.
Still no mention of any relative treatment of words within the same field... one would assume this would move the needle on relevance as well.
I still believe that this type of summarization approach can be used to make an implicit page tagger and social recommender like del.icio.us ... if you can filter the summary based upon some knowledge of the users real (as opposed to statistical) interests. Key route to auto-personalization of the web.
No comments:
Post a Comment