aicoder: Can we measure Google's monopoly like PageRank is measured?

Sunday, February 22, 2009

Can we measure Google's monopoly like PageRank is measured?

Jeremy Pickens posted an interesting note on his IR new blog:

Is it really true that Google is competing on a click-by-click basis? In the user studies that Google does, which of the following happens more often when the user types in a query to Google, and sees that Google has not succeeded in producing the information that they sought (fails):
Does the user reformulate his or her query, and click “Search Google” again (one click)? Or,
Does the user leave Google (one click), and try his or her query on Yahoo or Ask or MSN (second click), instead?

His points about actions 1 versus 2 are very astute. I’d guess that #2 happens a LOT on the # 2-10 search engines. Meaning people give that engine a try.. maybe attempt a reformulation.. then abandon that engine and try on Google. And I’m betting that people ‘abandon’ Google at a far less rate than other engines.. ie asymmetry of abandonment.

I’d love to do the following analysis given a browser log of search behavior:

Form a graph where the major search engines are nodes in the graph

For each pair of searches found in the log at time t and time t+1 for a given user, increment the counter on the edge SearchEngine(t) -> SearchEngine(t+1). Once the entire log is processed normalize the weights on all edges leaving a particular node.

We now have a markov chain of engine usage behavior. The directional edges in the graph represent probability of use transference to another engine, self-loops are the probability of sticking with the current engine.

If we calculate the stationary distribution of the adjacency matrix of probabilities, we should have a probability distribution that closely matches the market shares of the major engines. (FYI - this is what PageRank version 1.0 is - the stationary distribution of the link graph of the entire web)

What else can we do? We can analyze it like it’s a random walk and calculate the expected # of searches until a given user of any internet search engine will end up using Google. If the probabilities on the graph are highly asymmetric.. which I think they are.. this is a measure of the monopolistic power of people’s Google habit.

This should also predict the lifetime of a given ‘new’ MSN Live or Ask.com user.. meaning the number of searches they do before abandoning it for some other engine.

Predicted End Result: Google is the near-absorbing state of the graph.. meaning that all other engines are transient states on the route to Google sucking up market share. Of course this is patently obvious unless one of the bigs changes the game.

8 comments:

Daniel Tunkelang said...: Interesting. But I believe the legal question is not whether Google has a monopoly, but rather whether it abuses its market power to either achieve or extend that monopoly. Success is not a crime.

That said, I agree with Jeremy that Google is often disingenuous about the entry barriers it has created for competitors.; 2/22/2009 06:26:00 PM
Neal said...: Sure.. not legal monopoly.. behavioral/habitual monopoly of people's searching attention is what I'd like to measure.; 2/22/2009 06:38:00 PM
Anthony said...: Remember, in those usage studies they were only looking at what happens when a user doesn't find what they're looking for on the first search. Any node in that graph can be a sink - ie. the person found what they were looking for with the first search.

This is the age old Coke versus Pepsi debate. But with Google, it's more like Coke versus Wal-Mart generic cola. Google is so popular that it has become the only "name brand" in Internet search.

Think about a person who is new to the internet (your grandparents or an eight year old). They will probably ask someone how to find stuff online, and that person will almost always show them Google. The cycle repeats, and market share increases.

I think you're right, though: people are much more likely to give up on a "second tier" search engine than give up on Google. For myself, I use Google almost exclusively, and might go to another search engine once per month (or even less frequently). When I don't find something, I usually reformulate my query once or twice, and if I don't find it, I simply give up instead of clicking the little dropdown on my browser's search bar and selecting Yahoo.

All in all, I'm proud to be a Google dittohead, because their results are just so darn good.; 2/22/2009 08:05:00 PM
Neal said...: Chaining together cross engine behavior, or putting boundaries on search sessions isn't what I'm proposing.. that's hard as we learned at RightNow. I'm only talking about macro behaviors, counting search events in user X's browsing history. Looking for switches between major search engines for any reason.

I'm not so willing to conclude that Google is just better than Yahoo and MSN Live anymore... sometimes better to a user means 'familiar' and not really 'superior'?; 2/22/2009 08:29:00 PM
zgecko said...: For better or for worse, they are doing really good job most of the time. And as Anthony pointed out most people will revert back to different search engine only when they can not find stuff.

One comment to your Markov model is that your initial states here are almost more important. I am guessing that while there is more likely transition to Google than from it the back links are a lot stronger. Your average grandma that just got AOL will stick to what is there for quite a while, unless there is some wiz around to show her the ropes. Well and that might just be the verb google.
i would be curious to see the stats. If most people start at Google, then most people would have to switch from Google to something else rather than the other way around. What matters is what sticks, and that your Markov model will probably not capture.; 2/22/2009 09:03:00 PM
Neal said...: zgecko,
Absolutely it's a coarse grained model.. but I'd be you a beer that an assumption that the 'stickiness' of all major engines is equal would likely be pretty close to reality... or at least good enough to not appreciably affect results.

Yep, the initial state is important and we know it's not a uniform distribution.

This looks like an interesting paper that might inform a better switching model:

An Analysis of Search Engine Switching Behavior Using Click Streams
http://www.springerlink.com/content/p72176q523241436/; 2/22/2009 09:44:00 PM
Anthony said...: This seems familiar. Wasn't there a meta search engine that would randomly blend top results from different search engines instead of doing actual searching. Or maybe I'm thinking of an academic study.; 2/22/2009 10:16:00 PM
Neal said...: http://metacrawler.com
http://dogpile.com
http://webcrawler.com
http://webfetch.com

We looked at early papers on Oren Etzioni's metacrawler years ago in the RightNow Colloquium.; 2/22/2009 10:18:00 PM