Thursday, August 20, 2009

What can we learn from the Google File System?

I found this via my twitter herd and I've been thinking about it for days. Kirk McKusick interviews Sean Quinlan about the Google File System. Fascinating stuff.

We're very fortunate to have storage scalability challenges ourselves at 'Undisclosed' (formerly OthersOnline). We're amassing mountains of modest chunks of information via a set of many many hundreds of millions of keys. We've evolved thusly:

  1. MySQL table with single key and many values, one row per value.
  2. MySQL key/value table with one value per key
  3. memcachedb - memached backed by Berkeley DB
  4. Nginx + Webdav system
  5. Our own Sam Tingleff's valkyrie - Consistent Hashing + Tokyo Tyrant
#5 is the best performing, yet we still aren't going to escape the unpredictable I/O performance of EC2 disks. To my understanding this serves a role much like the chunk servers of GFS. We need low latency read access to storage on one side, and high throughput write access on the other side.

Combining the insights with the above interview and the excellent Pathologies of Big Data, I'm left with the impression that one must absolutely limit the number of disk seeks and do some 'perfect' number of I/O ops against of X MB chunks.

Random questions:
What is X for some random hardware in the cloud? How do we find it? What if X changes as the disk gets full? What kind of mapping function do we send the application key into to return a storage key to get the best amount of sequential disk access on write and best memory caching of chunks on read? What about fragmentation?

It does seem as though the newish adage is very true.. web 2.0 is mostly about exposing old-school unix commands and system calls over HTTP. I keep thinking this must have been solved a dozen times before. cycbuf feature of usenet servers?

Posted via email from nealrichter's posterous

No comments: