Stuck with “NoSQL”?

While listening to the energetic and thoughtful discussion between the RIAK team and Tim Anglande I was struck by the difficulty of coming up with a better moniker than “NoSQL”. There is always a period at the beginning of a technology when it is understood in terms comparing it to the status quo–horseless carriage, wireless telegraphy, digital photography. When the technology is socialized (that is, widely understood and applied), these terms are replaced with positive terms–automobile, radio, imaging. NoSQL is still at the first stage, but I’m not sure there will ever be a clear term positively encompassing all the technologies it encompasses. Here’s why.

SQL vs Application

Here is the picture as of maybe ten years ago. The needs of almost all applications were covered effectively by row-oriented relational data stores. However, because these stores were mature, further improvements were discounted by the Law of Diminishing Returns.

Needs Plasticity

Here is the picture for applications that require frequent schema changes. The need for plasticity exceeds the ability of SQL to deliver, at least at a reasonable cost. I’ve worked on a life insurance system for 12 years that has relied on the ability to make daily schema changes. We wouldn’t have been successful with any SQL + ORM technology I know but our object database (Gemstone) handles such changes gracefully for the amount of data we need to manage.

The picture also shows where document-oriented databases like MongoDB or CouchDB fit in. There are things you can do with a row-oriented relational store that you can’t do with Mongo or Couch, but the key advantage they offer is the ability to rapidly change the shape of data. If that’s what your application needs, for example in a rapidly evolving startup, then a document store is a win.

Here is where the futility of defining NoSQL as anything but a negative space becomes apparent to me. There isn’t just one dimension along which SQL stores are inadequate for some applications, there are half a dozen. Different vendors address different subsets of these criteria. There is some overlap (I tried drawing several stores into this picture but it quickly became spaghetti), but the differences between the stores is large relative to their commonality.

To name the universe of all these products, you’d have to call them something like “better than SQL in one or two ways but worse in several others that don’t matter in some circumstances”. Shorten that to a phrase that fits on a T-shirt and you get “NoSQL”. I’m afraid we’re stuck with it for a while.

22 Comments

Sonny GillMarch 12th, 2010 at 8:56 pm

Looking at that second image, I was struck by how clearly it shows NoSQL as a disruptive technology (http://en.wikipedia.org/wiki/Disruptive_technology). It will be interesting to see whether (or how soon) it replaces RDBMS in a majority of software systems. Do you have a guestimate?

Alexandre JasminMarch 12th, 2010 at 9:39 pm

I agree NoSQL is such a vague term. It can include anything from dbm to eXist-db or even non-SQL relational databases like Rel.

adminMarch 12th, 2010 at 10:35 pm

SQL is so deeply embedded in the whole technology stack, in the way people are trained, in supporting materials that I think it will be a while before NoSQL alternatives take a large percentage of the “market”. I can’t make a numerical prediction.

I almost put a paragraph in about the relationship of that final picture with The Innovator’s Dilemma. Christensen’s model talks about a single shift in the basis of competition. Here you have several shifts in play at the same time. I’d be curious what he would say about the situation.

Till SalzerMarch 13th, 2010 at 12:47 am

I was wondering what NoSQL is good for. I still find it hard to believe an IT system relies on frequent schema changes, but finally I see a valid (important) point in NoSQL. Thank you!

Plus, I’m courious about a positive name for NoSQL. I guess it takes some time to settle for a set of common requirements, and probably some good popular examples to convince more cautious peolple like me. ;)

Roger RohrbachMarch 13th, 2010 at 3:01 am

In his comparison of high-performance scaleable data stores (http://cattell.net/datastores/Datastores.pdf), Rick Cattell groups the “NoSQL” alternatives into three categories: key-value stores (of which Riak is an example), document stores (e.g., CouchDB, MongoDB) and extensible record stores (e.g., Cassandra). This framework may be helpful in thinking about the space.

adminMarch 13th, 2010 at 7:40 am

That’s a taxonomy based on the solution, basically what is left out from the SQL model. I think we’ll find a successor to “NoSQL” when we have a taxonomy on the problem side.

Sebastian KübeckMarch 13th, 2010 at 8:13 am

The situation could get worse.
I guess that folks might create SQL layers on top of NoSQL-Databases for integration with legacy tools. Similar to what the Caché folks did.
How should we call he whole thing then? :-)

fishMarch 13th, 2010 at 8:56 am

For me, the primary upshot of the diagrams in this article was to make me hungry for fried eggs. I’m not trying to be a jerk, but honestly I have to mention that they didn’t add much to your conclusion.

I’m not certain why the academic and business-speak terms were rendered outside the eggwhite, for example. If I’m going by the implicit color conventions set forth in the first figure, shouldn’t ‘reliability’, ‘write scale’, etc. be considered application needs? Or do you consider those parameters differently, as subordinates of the specific needs of the database and not the application as a whole? Both are arguable, sorta, but I’m not sure what you’re getting at in this case here.

I gather that the aesthetic of this sort of diagram is informed greatly by the CouchDB teams’ near-ubiquitous use of adorable yet considered napkin-sketches, in leu of OmniGraffle nonsense and the like. I’d love it if these sort of idioms were put to use to address the evolution of the “NoSQL” defacto brand, indeed. Food for thought, indeed (versus food for stomachs such as eggs).

adminMarch 13th, 2010 at 8:59 am

No offense taken. Pictures like this are how I think. If they don’t help you, it’s no skin off my nose.

NmmMarch 13th, 2010 at 9:01 am

@Sebastian

Mo’SQL

adminMarch 13th, 2010 at 9:11 am

Two thumbs up!

Andrew WolfeMarch 14th, 2010 at 2:33 pm

In my opinion, lots of people are familiar with relational databases, but few are actually ‘trained.’ If you go back to the original Codd paper, you see that flexibility was the original driving force for the relational model and the critical point of sale for relational technology.

It is tempting to look at database application development, and believe there really are databases with frequent schema changes. When you have a live database of a realistic size, you find out two things about database refactoring:

* it’s very difficult and time-consuming to restructure tens of millions of rows of data in a relational database
* the very thought of performing restructuring on a similar volume of data that is in an object hierarchy structure is appalling.

I have a lot of common ground with the object-oriented perspective. But every application of any size or power is, in fact, an application generator. For that you need a metadata-driven approach. With an object store, “is-a” is always the dominating relationship of metadata. Under relational technology, “is-a” is just another relationship and you can code it with many different behaviors and manifestations if this suits your application-building needs.

I may be wrong, but it seems that most of the time, “NoSQL” is the presupposition of the software developer rather than the conclusion.

Stephen StillwaterMarch 14th, 2010 at 11:22 pm

Something I started wondering about while reading this article: do people have beef with SQL and the ORM issues associated with it, or the relational model for data? I guess I’m not quite sure what my question is. Whenever people start saying “NoSQL” I immediTely starting thinking of the corporate data warehouse and wonder how it fits in to all of this.

Any thoughts?

Roger RohrbachMarch 15th, 2010 at 1:23 am

Here’s a problem- (or at least, requirements-) oriented taxonomy, with CAP as the organizing principle: http://blog.nahurst.com/visual-guide-to-nosql-systems

adminMarch 15th, 2010 at 6:40 am

Andrew,

In my experience restructuring an object database, even with substantial amounts of data, is not a big deal if you do it a little at a time.

As far as NoSQL being a solution in search of a problem, there is certainly some of this going on. The novelty vampires are always looking for the latest thing. OTOH, there are significant technical problems that can be more effectively solved with NoSQL stores.

adminMarch 15th, 2010 at 6:42 am

For me it’s not the relational model that creates problems, it’s the limitations implied by the model. For example, as soon as you support joins distributed processing is much harder.

Michael NortonMarch 22nd, 2010 at 5:23 pm

Strange, isn’t it?

SQL is not an accurate name for the systems we are contrasting to NoSQL. SQL is a language used in many Relational DataBase Management Systems (RDBMS). And it is actually the RDBMS to which we are comparing MongoDB, CouchDB, and the like.

NoRDBMS is more accurate, but no more positive.

Structured Storage is a frequently used term. It has no negative connotation, but I suspect many will argue the term is too vague and can apply to RDBMS and non-RDBMS alike. Both are structured.

So I think the name NoSQL will stick for a while.

As far as NoSQL being a solution looking for a problem, I don’t think that is an accurate assessment. Twitter and several others have moved to NoSQL systems to address valid technical needs. While I agree these may be uncommon needs, there are places where this type of structure is appropriate.

It is important that we continue to question our rules and standards. All of them are based on solutions to problems that existed in the past. Are the problems still valid? Are the solutions still optimal?

Guy CooksonMarch 24th, 2010 at 5:56 am

Really interesting read. I totally know where you’re coming from with regards to getting thoughts down on paper as you think, great to see that actually.

Kent LaursenMarch 25th, 2010 at 9:20 am

Over the years I’ve found myself trending towards a graph oriented view towards software. From the behavior standpoint, I can represent activities using process graphs. Of course the inputs, parametrization and outputs of nodes in these process graphs are data structures, themselves representable as graphy thinks like object graphs, trees of XML, etc.

When I work in a graph minded mode, it becomes more natural to make data more elastic, if you will. Some use cases involve being able to annotate data semantically to give it new meanings, beyond the scope of the applications that originally created the data. In graph speak, we are adding new arcs to the existing information/data graphs.

Other use cases involve the creation of variable compositions of data at runtime for which the “schema” of the resultant type does not exist beforehand. It might become a first class schema item later, though.

So in order to handle the process and data dynamics of large scale, composable systems, I prefer graph orientation, one example of which is object graphs. I hope for a comeback of scalable OODBMS. The best, standardized metadata approach (XML Schema, ain’t it IMO) is yet to be a household word.

At present I am working to put together a metadata driven approach for automating a broad set of processes and information structures that leverages XMI based standards such as those from the OMG. This is proving to be an interesting exercise, that I hope will yield more concrete approaches to dealing with elastic/dynamic data.

Unfortunately I don’t have a good proposal for a NoSQL replacement candidate, in part for reasons the blog author suggests. A single label rarely connotes all the characteristics of its target. But just for fun…

GOOD – Graph Oriented Ontologized Data
GADS – Graph Architected Data Systems
GOUDA – Graph Oriented Universal Data Access
ANARCHY – Add New Arc Really Carefully Here Yourself

PurplePilotMarch 30th, 2010 at 2:25 am

Deep sigh, CODASYL ……

Jnan DashJune 28th, 2010 at 10:05 pm

NoSQL “movement” unfortunately appears as the next wave after Relational Database. This is far from the truth. Some of the terms outside your circle diagram, such as read scale, write scale, etc. are very much part of many RDBMS design over the years. Performance was always a high priority item in designs of products like IBM DB2 and Oracle DBMS (I was part of both development teams).

The real key point here is the “extreme scalability” demanded by the likes of Twitter, Facebook, and Google. Hence much of the “industrial-strength” features and consistency elements can be forfeited in favor of very low-latency fetch and search over a humongous amount of data.

It is good that alternative designs are coming up to address such needs (e.g. Cassabdra, Big Table, MongoDB, etc.). One does not use these for transferring billions of dollars between accounts in a large bank, because the “eventually-consistent” model will not be acceptable.

Therefore, such solutions in the NoSQL camp will address specific types of applications, and will never be an universal solution. The RDBMS market has taken 20 years to get to be a multi-billion dollar business for a reason. Both RDBMS and NoSQl camp will co-exist and their peaceful co-existence will pose extra challenges of integration. The “one-size-fits-all” has never worked in our industry.

adminJune 30th, 2010 at 4:59 am

Jnan,

There’s no question that the RDBMS circle is very large. Billions of dollars of development have been spent over decades making it so. The point of diminishing returns seems to have been reached making it larger, however, at the same time that the demands of projects have continued to expand. The vast majority of projects are still covered well by the existing circle, but the trend is towards applications that exceed the circle in the various ways listed. The question for a project is whether compensating for what you give up by moving away from a row-oriented relational store is more expensive than the accommodations you would have to make by remaining with an RDBMS.