The Twelve Days of NoSQL: Day Nine: NoSQL Taxonomy
On the ninth day of Christmas, my true love gave to me
Nine ladies dancing.
NoSQL databases can be classified into the following categories:
- Key-value stores: The archetype is Amazon Dynamo of which DynamoDB is the commercial successor. Key-value stores basically allow applications to “put” and “get” values but each product has differentiators. For example, DynamoDB supports “tables” (namespaces) while Oracle NoSQL Database offers “major” and “minor” key paths.
- Column-family stores: Column-family stores allow data associated with a single key to be spread over multiple storage nodes. Each storage node only stores a subset of the data associated with the key; hence the name “column-family.” A key is therefore composed of a “row key” and a “column key” in a matter analogous to the major and minor key paths of Oracle NoSQL Database.
- Graph databases: Graph databases are non-relational databases that use graph concepts such as nodes and edges to solve certain classes of problems: for example; the shortest route between two towns on a map. The concepts of functional segmentation, sharding, replication, eventual consistency, and schemaless design do not apply to graph databases so I will not discuss graph databases.
NoSQL products are numerous and rapidly evolving. There is a crying need for a continuously updated encyclopedia of NoSQL products but none exists. There is a crying need for an independent benchmarking organization but none exists. My best advice is to do a proof of concept (POC) as well as a PSR (Performance Scalability Reliability) test before committing to using a NoSQL product. Back in the day, in 1985 to be precise, Dr. Codd had words of advice for those who were debating between the new relational products and the established pre-relational products of his day. The advice is as solid today as it was in Dr. Codd’s day.
“Any buyer confronted with the decision of which DBMS to acquire should weigh three factors heavily.
The first factor is the buyer’s performance requirements, often expressed in terms of the number of transactions that must be executed per second. The average complexity of each transaction is also an important consideration. Only if the performance requirements are extremely severe should buyers rule out present relational DBMS products on this basis. Even then buyers should design performance tests of their own, rather than rely on vendor-designed tests or vendor-declared strategies. [emphasis added]
The second factor is reduced costs for developing new databases and new application programs …
The third factor is protecting future investments in application programs by acquiring a DBMS with a solid theoretical foundation …
In every case, a relational DBMS wins on factors two and three. In many cases, it can win on factor one also—in spite of all the myths about performance.”
—An Evaluation Scheme for Database Management Systems that are claimed to be Relational