Why NoSQL Now?
Why have NoSQL database reached the tipping point now? Almost twenty years ago I lived through the attack of the killer object databases. While they had some lovely technical superiorities (and I still make part of my living helping maintain a Gemstone/Smalltalk application), relational databases were able to beat object databases in the market. It wasn’t time.
Now, though, seemingly suddenly, alternative data models are all the rage. While my narcissistic programer self would love to believe this had something to do with technical superiority, experience argues otherwise. Instead, if you want to understand technical change it’s more effective to follow the money.
I was talking to a developer of a cloud-deployed application (hi Patrick, love that BrowserMob.com!) and our design discussions quickly focused on money. SimpleDB is great for high-transaction rate storage, but it’s too dang expensive for reporting. A special-purpose database just for reporting makes sense. It’s worth extra programming to reduce operational costs (the classic capex/opex tradeoff familiar to telecom engineers).
Some Made Up Numbers
Here are some made up numbers (thanks @jkordyback) to illustrate the dynamic (adjust the numbers for your actual situation). Say you have a database from a commercial vendor. It costs you annually:
$50K (license) + $1.5K (electricity) + $1K (capital) = $52.5K
If you need more performance it makes sense to get beefier hardware:
$50K + $2K + $3K = $55K
The performance advantages of an alternative data storage paradigm (column-oriented, document-oriented, key-value, map-reduce) don’t justify the additional cost and complexity.
Eliminate the license and the cost of electricity becomes a huge percentage of the cost of a database. (EC2 is basically a really complicated way of charging for electricity.) Any technical advantage that reduces energy usage turns directly into profit. Your database now costs you:
$0 + $1.5K + $1K = $2.5K
Buying beefier hardware is a giant bump:
$0 + $2K +$3K = $5K
What if you can avoid the hardware upgrade by shifting to an alternative database? Factor in internet-scale applications so you’re multiplying all your costs by 100 or 1000. The engineering required to shift to a different store or to keep multiply stores in sync vanish in comparison to the operational expense savings (1000 servers for illustration):
$0 + $1500K + $1000K = $2500K
Improving performance with hardware:
$0 + $2000K + $3000K = $5000K
Versus switching to a different store:
$0 + $1500K + $3000K + $500K (engineering cost) = $4000K
There are several things that catch my eye in this picture. One is that when I was going to school we were always taught, “In the olden days of computing, computers were expensive and programmers were cheap. Now it’s the reverse. Therefore…” We are back to the future. At internet scale, programmers are (sometimes) cheap compared to the cost of electricity. That’s a pretty fundamental assumption to change. I’m sure we haven’t fully digested the implications.
Another is that the technical advantages of alternative stores translate directly into economic advantage. If I was a big database vendor, I’d be diversifying away from reliance on big-iron licenses, say by buying a hardware company (oh…) or running a variety of storage models on my cloud (double oh…) Again, I’m sure we haven’t fully digested the implications of this shift.
In spite of the roughness of the numbers above, based on this I feel justified in my gut feel that the row-oriented relational model we’ve lived with for 30 years is about to shatter. Look for opex optimization to become an increasingly important topic for engineers. Look for vendors, both software and services, to deliver further opex improvements. Where it goes from there isn’t clear, but it certainly will be interesting. The time has come.
The “financial” “models” above are just thinking tools to look at trends. I’d love to see some real numbers and trends to validate the qualitative conclusions I’ve already jumped to.