Home > Big Data, DBA, Hadoop, NoSQL, Oracle, Physical Database Design, SQL > SQL v/s NoSQL: Amazon v/s eBay and the false premise of NoSQL

SQL v/s NoSQL: Amazon v/s eBay and the false premise of NoSQL


Update: The recording is now available at http://www.confio.com/webinars/nosql-big-data/. The slide deck is at https://iggyfernandez.wordpress.com/wp-content/uploads/2013/10/nosql-and-big-data-for-oracle-dbas-oct-2013.pdf.

In my webinar for Confio on October 10, I will explain that the deficiencies of relational technology are actually a result of deliberate choices made by the relational movement in its early years. The relational camp needs to revisit these choices if it wants to compete with NoSQL and Big Data technologies in the areas of performance, scalability, and availability. Previous versions of this presentation have been delivered at Great Lakes Oracle Conference, NoCOUG, and OakTableWorld. New material is constantly being added to the presentation based on attendee feedback and additional research. Attendees seem most interested in learning that eBay had the same performance, scalability, and availability requirements as Amazon but stuck with Oracle and SQL.

Register at http://marketo.confio.com/NoSQLBigDataforOracle_RegPage.html.

  1. The origins of NoSQL
    • Amazon’s requirements
    • Amazon’s solution
    • Amazon v/s eBay
  2. The false premise of NoSQL
    • Zeroth normal form
    • The problem with flat tables
    • Eliminating the join penalty
    • Dr. Codd on eventual consistency
  3. The NoSQL landscape
    • Key-value databases
    • Document databases
    • Column-family databases
    • Graph databases
    • Map/Reduce
  4. What makes relational technology so sacred anyway?
  5. Mistakes of the relational camp
    • De-emphasizing physical database design
    • Discarding nested relations
    • Favoring relational calculus over relational algebra
    • Equating the normalized set with the stored set
    • Marrying relational theory to ACID DBMS
    • Ignoring SQL-92 CREATE ASSERTION
  6. Going one better than Hadoop
    • Preprocessor feature of external tables
    • Partition pruning in partition views
    • Parallel UNION ALL in Oracle Database 12c
  7. Bonus slides
    • NewSQL
    • NoSQL Buyer’s Guide
  1. October 11, 2013 at 2:21 am
  2. Jason Bucata
    October 11, 2013 at 4:21 pm

    I’m glad you commented about partitioning clusters. I agree that the lack of official partitioning is a huge hinderance to modern adoption of clusters. I’d love to use clusters for a few of our key tables, if only we could also partition them…

    I dug up your old blog post from 2011 where you talked about the unofficial partitioning support. It looks like it enables partitioning on the cluster key, just based on the example given.

    [minor rant follows… scroll to the bottom for the summary]

    …except wait: What I’d really like to see is the ability to partition the cluster by some column that’s not (part of) the cluster key. So I’d like to partition my transactions table by month, using a cluster on transaction ID. So then my transaction #1234567 would live in the October 2013 partition, and then the detail tables would be clustered into the same block with that parent row even though they don’t have the month column (this idea became vaguely sane with the advent of PARTITION BY REFERENCE).

    This comes from real situations in one of our biggest apps, because the tables were normalized when delivered by the vendor, and it’d be an enormous pain to try to denormalize to add a month column to all the detail tables!

    Ah, but wait: Partitioning doesn’t mean much if we can’t also use compression (even “basic” compression). We’ve gotten quite used to using table compression on old partitions, partitioned by month. Compression on the parent table might help some, with multiple parent row per block, but we see even bigger space savings on compressing the detail rows.

    Oh, but wait: The way we’ve found that works best to compress our historical data is to do a direct-path insert into an empty exchange table and exchange it in afterward. (A judicious ORDER BY can improve compression immensely.) So ideally, Oracle would have to allow partition exchange at the cluster level.

    Hey, but then, we’d also have to be able to do direct-path insert into a clustered table, which we can’t do today. Especially since we’d have to load multiple tables at once (although maybe an INSERT ALL statement could do that?).

    Maybe I’m being too picky: Maybe I should be content with an ALTER CLUSTER MOVE PARTITION COMPRESS command, and give up the extra disk savings we get by doing it ourselves. (I think there’s also index availability considerations too, though.)

    But a lot of data warehouses like to do data loads via the partition exchange method (or so we’re told in presentations and blog posts). So maybe it’s not just me who’s wishing for a partition exchange command for clusters (after wishing for partitioned clusters to begin with).

    Summary: Being able to do simple partitioning on a cluster isn’t a very useful feature by itself unless even more enhancements are added on top to bring it to par with the existing partitioning features of heap tables. We’re too spoiled.

    (I wonder if the real reason why Oracle isn’t working on clusters is rather because they’re moving more in the direction of column-oriented stores, ala HCC and the new 12c in-memory columnwise cache. Kind of hard to bond separate tables together when the trend is to carve individual tables into multiple stores by groups of columns…)

    • Iggy Fernandez
      October 13, 2013 at 10:47 pm

      Thank you, Jason. I’ll forward your comments to the concerned Oracle product manager. The more likely reason why more effort is not spent on enhancements to clusters is because Oracle product management believes that there is not enough interest in clusters to justify the effort.

  1. October 9, 2013 at 6:08 pm

Leave a comment