Home > Big Data, DBA, Hadoop, NoSQL, Oracle, SQL > The Twelve Days of NoSQL: Day Ten: Big Data in a Nutshell

The Twelve Days of NoSQL: Day Ten: Big Data in a Nutshell


On the tenth day of Christmas, my true love gave to me
Ten lords a-leaping.

(Yesterday: NoSQL Taxonomy)(Tomorrow: Mistakes of the relational camp)

The topic of Big Data is often brought up in NoSQL discussions so let’s give it a nod. In 1998, Sergey Brin and Larry Page invented the PageRank algorithm for ranking web pages (The Anatomy of a Large-Scale Hypertextual Web Search Engine by Brin and Page) and founded Google. The PageRank algorithm required very large matrix-vector multiplications (Mining of Massive Datasets Ch. 5 by Rajaraman and Ullman) so the MapReduce technique was invented to handle such large computations (MapReduce: Simplified Data Processing on Large Clusters). Smart people then realized that the MapReduce technique could be used for other classes of problems and an open-source project called Hadoop was created to popularize the MapReduce technique (The history of Hadoop: From 4 nodes to the future of data). Other smart people realized that MapReduce could handle the operations of relational algebra such as join, anti-join, semi-join, union, difference, and intersection (Mining of Massive Datasets Ch. 2 by Rajaraman and Ullman) and began looking at the possibility of processing large volumes of business data (a.k.a. “Big Data”) better and cheaper than mainstream database management systems. Initially programmers had to write Java code for the “mappers” and “reducers” used by MapReduce. However, smart people soon realized that SQL queries could be automatically translated into the necessary Java code and “SQL-on-Hadoop” was born. Big Data thus became about processing large volumes of business data with SQL but better and cheaper than mainstream database management systems. However, the smart people have now realized that MapReduce is not the best solution for low-latency queries (Facebook open sources its SQL-on-Hadoop engine, and the web rejoices). Big Data has finally become about processing large volumes of business data with SQL but better and cheaper than mainstream database management systems and with or without MapReduce.

That’s the fast-moving story of Big Data in a nutshell.

Also see: The Twelve Days of SQL: Day Ten: Sometimes the optimizer needs a hint

About these ads
  1. January 5, 2014 at 1:54 am

    Hi Iggy,
    This is fantastic stuff. I’ve been following the series since day one. A very interesting mixture of historical and technical. The references are especially valuable. It’s fairly clear that the quote on you blog header is well chosen: “Books to the ceiling, Books to the sky, My pile of books is a mile high. How I love them! How I need them! I’ll have a long beard by the time I read them”. But I think you have read a lot of them!

    Regards,
    Fergal

    • Iggy Fernandez
      January 5, 2014 at 10:22 am

      Thank you very much. Reading is a lot easier than writing though. I should shave off the long beard and start work on that “compendium of NoSQL and Big Data.”

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 744 other followers

%d bloggers like this: