The Twelve Days of NoSQL: Day Four: Sharding
On the fourth day of Christmas, my true love gave to me
Four colly birds.
Let’s recap what we have covered so far.
- Day One: NoSQL technology consists of “disruptive innovations” which present a dilemma for established players.
- Day Two: The goals of NoSQL technology are extreme performance, extreme scalability, and extreme availability.
- Day Three: The underpinning component of NoSQL technology is functional segmentation which results in simple hierarchical schemas.
Amazon’s next design decision was “sharding” or horizontal partitioning of all the tables in a hierarchical schema. Hash-partitioning is typically used. Each table is partitioned in the same way as the other tables in the schema and each set of partitions is placed in a separate database referred to as a “shard.” The shards are independent of each other; that is, there is no clustering (as in Oracle RAC) or federation (as in IBM DB2).
Note that the hierarchical schemas that result from functional segmentation are always shardable; that is, hierarchical schemas are shardable by definition.
Returning to the example from Ted Codd’s 1970 paper on the relational model:
- employee (man#, name, birthdate) with primary key (man#)
- children (man#, childname, birthyear) with primary key (man#, childname)
- jobhistory (man#, jobdate, title) with primary key (man#, jobdate)
- salaryhistory (man#, jobdate, salarydate, salary) with primary key (man#, jobdate, salarydate)
Note that the jobhistory, salaryhistory, and children tables have composite keys. In each case, the leading column of the composite key is the man#. Therefore, all four tables can be partitioned using the man#.
Sharding is an essential component of NoSQL designs but it does not present a conflict with the relational model; it too is simply a physical database design decision. In the relational model, the collection of standalone databases or shards can be logically viewed as a single distributed database.