Monday, August 11, 2008

MapReduce versus RDBMS

I managed to stumble upon an interesting article while looking for a MySQL multi-database federated query tool.

David DeWitt and Michael Stonebraker write: MapReduce: A major step backwards.

They rightly point out that MapReduce is a 25 year old idea. Lisp has had this functionality for decades.. and it's actually at least 30 years old. Griss & Kessler 1978 is apparently the earliest description of a parallel Reduce function. That said, it's only in the last 10 years that an idea this great could have been implemented widely with the advent of cheap machines.

Their second point is that MapReduce is a poor implementation as it doesn't support or utilize indexes.
One could argue that value of MapReduce is automatically providing parallel execution on a grid of computers. This feature was explored by the DBMS research community in the 1980s, and multiple prototypes were built including Gamma [2,3], Bubba [4], and Grace [5]. Commercialization of these ideas occurred in the late 1980s with systems such as Teradata.

In summary to this first point, there have been high-performance, commercial, grid-oriented SQL engines (with schemas and indexing) for the past 20 years. MapReduce does not fare well when compared with such systems.
Great point and point taken. However, where are the open source implementations of the things you mention? This is a bit of the 'if a tree falls in the woods and no one is there to hear it' problem. A major reason MapReduce has seen uptake (other than being a child of Google) is that an example implementation is available for the Horde to steal, copy, improve & translate.

The modern user generated content web is mostly built on Open Source these days, so the fact that I can get the above technology in commercial databases is a non-starter.

I'm a SQL junkie and am searching in vain (it seems so far) for decent extension to MySQL that does cross-database query and reduction of tables I know to be neatly partitioned. No luck so far. Starting to look into other SQL engines as their maybe a ODBC wrapper for the federation layer. It's got to be mostly functional and EASY to adopt.. or you'll continue to have people spouting the MapReduce dogma.

Post scripts:
  • Nice summary of Dr. Stonebraker's accomplishments here
  • Funny link to comp.lang.lisp some newbies asking about if Lisp has Map Reduce.

No comments: