Home > NoSQL, Oracle, Physical Database Design, SQL > The Twelve Days of NoSQL: Day Four: Sharding

The Twelve Days of NoSQL: Day Four: Sharding


On the fourth day of Christmas, my true love gave to me
Four colly birds.

(Yesterday: Functional Segmentation)(Tomorrow: Replication and Eventual Consistency)

Let’s recap what we have covered so far.

  • Day One: NoSQL technology consists of “disruptive innovations” which present a dilemma for established players.
  • Day Two: The goals of NoSQL technology are extreme performance, extreme scalability, and extreme availability.
  • Day Three: The underpinning component of NoSQL technology is functional segmentation which results in simple hierarchical schemas.

Amazon’s next design decision was “sharding” or horizontal partitioning of all the tables in a hierarchical schema. Hash-partitioning is typically used. Each table is partitioned in the same way as the other tables in the schema and each set of partitions is placed in a separate database referred to as a “shard.” The shards are independent of each other; that is, there is no clustering (as in Oracle RAC) or federation (as in IBM DB2).

Note that the hierarchical schemas that result from functional segmentation are always shardable; that is, hierarchical schemas are shardable by definition.

Returning to the example from Ted Codd’s 1970 paper on the relational model:

  • employee (man#, name, birthdate) with primary key (man#)
  • children (man#, childname, birthyear) with primary key (man#, childname)
  • jobhistory (man#, jobdate, title) with primary key (man#, jobdate)
  • salaryhistory (man#, jobdate, salarydate, salary) with primary key (man#, jobdate, salarydate)

Note that the jobhistory, salaryhistory, and children tables have composite keys. In each case, the leading column of the composite key is the man#. Therefore, all four tables can be partitioned using the man#.

Sharding is an essential component of NoSQL designs but it does not present a conflict with the relational model; it too is simply a physical database design decision. In the relational model, the collection of standalone databases or shards can be logically viewed as a single distributed database.

Also see: The Twelve Days of SQL: Day Four: The way you write your query matters

  1. No comments yet.
  1. No trackbacks yet.

Leave a comment