Database Performance Tuning: The great NoSQL debate

It had to happen. Apparently, it all started with a post from Ted Dzuiba provocatively titled "I can't wait for NoSQL to die"

The Internet is a wonderful thing. Twenty years ago, someone expressing these kind of ideas would hardly go noticed, unless he was on a very influential position. But people in such positions tend to try to not angry anyone. Thanks to the Internet, the ensuing discussions, responses and counterarguments came up so fast that at this point, less than two weeks after the post, I can add very little to the discussion between the two camps.

Which is something I'd not want to do, since all of them are my potential customers. And both sides have some merit in their arguments. And both sides get some points from the other wrong. So go and read for yourself the debate and then go back to this post if you don't know what I'm talking about.

This discussion has also given birth to some very good april's fool jokes

While I can add very little to the arguments coming from each side, I can explain the reasons for some of them. I feel somewhat like a psychiatrist, in that I can hear both sides and make them questions to explore what's really behind their arguments and understand where they come from.

The story of NoSQL (simplified version)

In the beginning, there were these hot web properties. Search engines, social networks, online shops. They faced a number of unique challenges when growing up, many of them completely new in the IT industry. The biggest of these was scale. None of the existing technologies, apart from the grossly unflexible, extremely expensive and closed mainframes, were ready to handle operations in the scale these web business were growing, much less to support the rate of change they were engaging in due to the fierce competition.

One of the biggest bottelenecks for scalability was their database layer. While maybe there was a relational database engine in the market that could cope with their workloads, it was neither free (speech or beer variety) nor simple to setup and administer. And those people, being extremely clever, did not wanted to tie their futures to any particular vendor, specially if they were potential future competitors. It was best both for their future and their current financial bottom line to user FOSS if possible.

So those web business, chock full of CS Ph. Ds, had to solve the database scalability problem on their own. I suppose that at first, they tried to optimize their existing database layer code, their front end access patterns, the caching engine. Everything.

No matter what they did, they could barely keep up with the demand. At some point, they dropped ACID. They denormalized. They kept adding nodes and nodes to their clusters.

The resulting system was fragile. It required a lot of knowledge to manage properly. And, worst of all, all those solutions had inherent growing limitations. The relational database was for them something were they had to shoehorn their functioanlity, instead of being the key tool to support it.

Their next step was to drop the relational database altogether. I suppose that for them it was a liberating moment, since they stopped using long ago any of the features that makes a relational database worth using.

Thus, NoSQL databases were born. They started from scratch and designed incredibly scalable database engines. Engines that are extremely fault tolerant. Engines whose programming interface was much simpler than SQL just because they were integrated with their language of choice. Engines designed for a read/write ratio like 1000:1 (you read something 1000 times and probably you'll only write on it once) Engines designed for the web world, at consumer level scale.

Database Performance Tuning

Saturday 3 April 2010

The great NoSQL debate - History

The story of NoSQL (simplified version)

No comments:

Post a Comment

Useful links