Archive

Archive for the ‘Big Data’ Category

The Mistakes of the Relational Camp: Mistake #1: The de-emphasis of physical database design

September 9, 2013 3 comments

See also :  No! to SQL and No! to NoSQL

The inventor of the relational model, Dr. Edgar “Ted” Codd believed that the suppression of physical database design details was the chief advantage of the relational model. He made the case in the very first sentence of the very first paper on the relational model saying “Future users of large data banks must be protected from having to know how the data is organized in the machine (the internal representation).” (“A Relational Model of Data for Large Shared Data Banks,” reprinted with permission in the 100th issue of the NoCOUG Journal.)

How likely is it that application developers will develop highly performant and scalable applications if they are shielded from the internal representation of data? The de-emphasis of physical database design was the biggest mistake of the relational camp and provided the opening for NoSQL and Big Data technologies to proliferate.

A case in point is that the language SQL which is universally used by application developers was not created with them in mind. As explained by the creators of SQL (originally called SEQUEL) in their 1974 paper, there is “a large class of users who, while they are not computer specialists, would be willing to learn to interact with a computer in a reasonably high-level, non-procedural query language. Examples of such users are accountants, engineers, architects, and urban planners [emphasis added]. It is for this class of users that SEQUEL is intended. For this reason, SEQUEL emphasizes simple data structures and operations [emphasis added].” (http://faculty.cs.tamu.edu/yurttas/PL/DBL/docs/sequel-1974.pdf)

If you were the manager of a bookstore, how would you stock the shelves? Would you stand at the door and fling books onto any shelf that had some free space, perhaps recording their locations in a notebook for future reference. Of course not! And would you scatter related books all over the bookstore? Of course not! Then why do we store rows of data in random fashion? The default Oracle table storage structure is the unorganized heap and it is chosen 99.9% of the time.

The de-emphasis of physical database design was an epic failure in the long run. Esther Dyson referred to the “join penalty” when she complained that “Using tables to store objects is like driving your car home and then disassembling it to put it in the garage. It can be assembled again in the morning, but one eventually asks whether this is the most efficient way to park a car.” [1]

It doesn’t have to be that way. Oracle Database has always provided a way to cluster rows of data from one or more tables using single-table or multi-table clusters in hashed or indexed flavors and thus to completely avoid the join penalty that Esther Dyson complained about. However, they must be the longest and best kept secret of Oracle Database—as suggested by their near-zero adoption rate—and have not been emulated by any other DBMS vendor. You can read more about them at https://iggyfernandez.wordpress.com/2013/07/28/no-to-sql-and-no-to-nosql/.

It doesn’t have to be that way. But it is.

1. Esther Dyson was the editor of a newsletter called Release 1.0. I’ve been unable to find the statement in the Release 1.0 archives at http://www.sbw.org/release1.0/ so I don’t really know the true source or author of the statement. However, the statement is popularly attributed to Esther Dyson and claimed to have been published in the Release 1.0 newsletter. I found a claim that the statement is found in the September 1988 issue but that didn’t pan out.

See also :  No! to SQL and No! to NoSQL

What’s your take on RDBMS and NoSQL?

April 9, 2013 2 comments

My take is that application developers have belatedly but correctly concluded that an RDBMS is not the best tool for every application. For example, relational algebra, relational calculus, and SQL are not the best tools for graph problems. As another example, weblogs are non-transactional and don’t benefit from the ACID properties of the RDBMS. Amazon created the Dynamo key-value store for a highly specific use case. From the Dynamo white paper: Customers should be able to view and add items to their shopping cart even if disks are failing, network routes are flapping, or data centers are being destroyed by tornados. … There are many services on Amazon’s platform that only need primary-key access to a data store. … Simple read and write operations to a data item that is uniquely identified by a key. Data is stored as binary objects (i.e., blobs) identified by unique keys. No operations span multiple data items and there is no need for relational schema. … The operation environment is assumed to be non-hostile and there are no security related requirements such as authentication and authorization.”

What’s your take?

P.S. I wasn’t always of this opinion because, until very recently, I had not studied NoSQL technologies. But my favorite quote is the one on consistency from Emerson’s essay on self-reliance: “The other terror that scares us from self-trust is our consistency; a reverence for our past act or word … bring the past for judgment into the thousand-eyed present, and live ever in a new day. …  A foolish consistency is the hobgoblin of little minds, adored by little statesmen and philosophers and divines. … Speak what you think now in hard words, and to-morrow speak what to-morrow thinks in hard words again, though it contradict every thing you said to-day.” (http://www.emersoncentral.com/selfreliance.htm)

However, I don’t give a free pass to the NoSQL camp. I believe that most of the problems that the NoSQL camp is trying to solve, with the sole exception of graph problems, could have been solved within the framework of relational theory. Since the relational camp couldn’t (or wouldn’t) solve the problems, the NoSQL camp came up with their own solutions and threw out the baby (relational theory) along with the bathwater (the perceived deficiencies of relational theory). My topic for the Great Lakes Oracle Conference is therefore “Soul-searching for the relational camp: Why NoSQL and Big Data have momentum.” (https://www.neooug.org/gloc/accepted-presentations.aspx)

Categories: Big Data, DBA, Hadoop, NoSQL, SQL

January 24, 2013 Leave a comment

The NoSQL movement is forcing the issue and the time will soon come when we will no longer be forced to store structured non-transactional data in transactional database management systems in order to exploit the universe of indexing, partitioning, and clustering technologies and the full declarative power of relational languages (not just SQL). ETL and BI are the perfect candidates in waiting.

So Many Oracle Manuals, So Little Time

Whenever salespeople phone Mogens Norgaard, he puts them off by saying that he just doesn’t use the products that they are calling about.

When the office furniture company phones, he says “We don’t use office furniture.” When the newspaper company phones, he says “We don’t read newspapers.” When the girl scouts phone, he probably says “We don’t eat cookies.”

Once he got a phone call from the phone company.

You can only imagine how that conversation went. Read the whole story at http://wedonotuse.com/stories-and-answers.aspx.

I wonder what Mogens would say if a database vendor phoned. I can imagine him saying “We don’t use databases. We don’t use indexes. We store all our data in compressed text files. Each compressed text file contains one year of data for one location. There is a separate subdirectory for each year. We have a terabyte of data going back to 1901 so we currently…

View original post 1,029 more words

Categories: Big Data, Hadoop, Humor, NoSQL, Oracle, SQL

We don’t use databases; we don’t use indexes

January 22, 2013 13 comments

Whenever salespeople phone Mogens Norgaard, he puts them off by saying that he just doesn’t use the products that they are calling about.

When the office furniture company phones, he says “We don’t use office furniture.” When the newspaper company phones, he says “We don’t read newspapers.” When the girl scouts phone, he probably says “We don’t eat cookies.”

Once he got a phone call from the phone company.

You can only imagine how that conversation went. Read the whole story at http://wedonotuse.com/stories-and-answers.aspx.

I wonder what Mogens would say if a database vendor phoned. I can imagine him saying “We don’t use databases. We don’t use indexes. We store all our data in compressed text files. Each compressed text file contains one year of data for one location. There is a separate subdirectory for each year. We have a terabyte of data going back to 1901 so we currently have 113 subdirectories. The performance is just fine, thank you.”

On second thoughts, that’s just too far-fetched.

You see, back in the early days of the relational era, the creator of relational theory, Dr. Edward Codd married relational theory with transactional database management systems (a.k.a. ACID DBMS) and the Relational Database Management System (RDBMS) was born. He authored two influential ComputerWorld articles—“Is your DBMS really relational?” (October 14, 1985) and “Does your DBMS run by the rules?” (October 21, 1985)—that set the direction of the relational movement for the next quarter century. Today, the full declarative power of “data base sublanguages” (the term coined by Dr. Codd) such as Structured Query Language (SQL) is only available within the confines of a transactional database management system.

Today, the full declarative power of “data base sublanguages” (the term coined by Dr. Codd) such as Structured Query Language (SQL) is only available within the confines of a transactional database management system.

But it shoudn’t have to be that way.

Consider the running example of “big data” used in Hadoop: The Definitive Guide. The National Climatic Data Center publishes hourly climatic data such as temperature and pressure from more than 10,000 recording stations all over the world. Data from 1901 onwards is available in text files. Each line of text contains the station code, the timestamp, and a number of climatic readings. The format is documented at ftp://ftp.ncdc.noaa.gov/pub/data/noaa/ish-format-document.pdf. The files are organized into subdirectories, one subdirectory for each year. Each subdirectory contains one file from each recording station that was in operation during that year. The individual files are compressed using gzip. All the files can be downloaded from ftp://ftp.ncdc.noaa.gov/pub/data/noaa/.

You might have already guessed where I am going with this.

Conceptually the above terabyte-sized data set is a single table. But it should not be necessary to uncompress and load this huge quantity of structured non-transactional data into a transactional database management system in order to query it. The choice of physical representation conserves storage space and is a technical detail that is irrelevant to the logical presentation of the data set as a single table; it is a technical detail that users don’t care about. As Dr. Codd said in the opening sentence of his 1970 paper A Relational Model of Data For Large Shared Data Banks (faithfully reproduced in the 100th issue of the NoCOUG Journal), “future users of large data banks must be protected from having to know how the data is organized in the machine (the internal representation).”

Users of large data banks must be protected from having to know how the data is organized in the machine (the internal representation).

Why shouldn’t we be able to query the above data set using good old SQL?

Well you can do just that with the Oracle query engine and you don’t have to load it into an Oracle database first. You can even take advantage of partitioning and parallelism. You can also write queries that mix and match data from the database and the filesystem.

The following demonstrations were performed using a pre-Built developer VM for Oracle VM VirtualBox. The version of Oracle Database is 12.1.0.1.

SQL*Plus: Release 12.1.0.1.0 Production on Fri Aug 16 1

0:45:04 2013

Copyright (c) 1982, 2013, Oracle.  All rights reserved.

In the demonstrations, we only consider the years from 1901 to 1904. Here is the directory structure.

/home/oracle/app/oracle/admin/orcl/dpdump/noaa
/home/oracle/app/oracle/admin/orcl/dpdump/noaa/1901
/home/oracle/app/oracle/admin/orcl/dpdump/noaa/1902
/home/oracle/app/oracle/admin/orcl/dpdump/noaa/1903
/home/oracle/app/oracle/admin/orcl/dpdump/noaa/1904

We first need to create “directories” and define an “external table.” The definition of this external table specifies a preprocessing script which is the secret sauce that makes it possible for the query engine to traverse the subdirectories and uncompress the data.

connect / as sysdba

alter session set container=pdborcl;

create or replace directory share_dir
  as '/u01/app/oracle/admin/orcl/share';

create or replace directory noaa_dir
  as '/u01/app/oracle/admin/orcl/share/noaa';

create or replace directory noaa_1901_dir
  as '/u01/app/oracle/admin/orcl/share/noaa_1901'; 

create or replace directory noaa_1902_dir
  as '/u01/app/oracle/admin/orcl/share/noaa_1902'; 

create or replace directory noaa_1903_dir
  as '/u01/app/oracle/admin/orcl/share/noaa_1903'; 

create or replace directory noaa_1904_dir
  as '/u01/app/oracle/admin/orcl/share/noaa_1904'; 

grant all on directory share_dir to public;
grant all on directory noaa_dir to public;
grant all on directory noaa_1901_dir to public;
grant all on directory noaa_1902_dir to public;
grant all on directory noaa_1903_dir to public;
grant all on directory noaa_1904_dir to public;

connect iggy/iggy@pdborcl

drop table temperatures;
create table temperatures
(
  station_code char(6),
  datetime char(12),
  temperature char(5)
)
organization external
(
  type oracle_loader
  default directory share_dir
  access parameters
  (
    records delimited by newline
    preprocessor share_dir:'uncompress.sh'
    fields
    (
      station_code position(1:6) char(4),
      datetime position(7:18) char(12),
      temperature position(19:23) char(5)
    )
  )
  location ('noaa')
);

Here’s the tiny preprocessing script that makes it possible for Oracle to traverse the subdirectories and uncompress the data. It recursively traverses the file system beginning with the location specified by the query engine; that is, the location specified in the table definition. It uncompresses all zipped files it finds and sends the output to the “cut” utility which cuts out only those column positions that we care about and writes what’s left to standard output, not to the filesystem. The table definition specifies its location as data_pump_dir.

#!/bin/sh
/usr/bin/find $1 -name "*.gz" -exec /bin/zcat {} \; | /usr/bin/cut -c5-10,16-27,88-92

All the capabilities of SQL—including analytic functions and pivoting—can now be exploited as shown in the following example. For each month in the year 1901, we list the top three recording stations in terms of average monthly temperature.

set pagesize 66
select /*+ gather_plan_statistics */ * from
(
  select
    month,
    station_code,
    dense_rank() over (partition by month order by average) as rank
  from
  (
    select
      substr(datetime,1,4)||'/'||substr(datetime,5,2) as month,
      station_code,
      avg(temperature) as average
    from temperatures
    where datetime >= '1901' and datetime < '1902'
    group by
      substr(datetime,1,4)||'/'||substr(datetime,5,2),
      station_code
  )
)
pivot(max(station_code) for rank in (1, 2, 3))
order by month;

MONTH                     1      2      3
------------------------- ------ ------ ------
1901/01                   2270   0296   0297
1901/02                   2270   0290   0296
1901/03                   2270   0290   0296
1901/04                   0290   0295   0298
1901/05                   0290   0295   0298
1901/06                   0290   0298   0295
1901/07                   0290   0295   2270
1901/08                   2270   0290   0296
1901/09                   0290   2270   0296
1901/10                   2270   0296   0290
1901/11                   2270   0296   0297
1901/12                   2270   0296   0290

It’s an epiphany, that’s what it is.

We can also use “partition views” and take advantage of “partition pruning.” For those who don’t remember, partition views are a really old feature that predates “real” partitioning in Oracle 8.0 and above. Partition views continue to work just fine today, even in Oracle Database 12c.

Let’s create a separate table definition for each year and then use a partition view to tie the tables together.

create table temperatures_1901
(
  station_code char(6),
  datetime char(12),
  temperature char(5)
)
organization external
(
  type oracle_loader
  default directory noaa_dir
  access parameters
  (
    records delimited by newline
    preprocessor share_dir:'uncompress.sh'
    fields
    (
      station_code position(1:6) char(4),
      datetime position(7:18) char(12),
      temperature position(19:23) char(5)
    )
  )
  location ('1901')
);

-- the remaining table definitions are not shown for brevity

create or replace view temperatures_v as
select * from temperatures_1901
where datetime >= '190101010000' and datetime < '190201010000'
  union all
select * from temperatures_1902
where datetime >= '190201010000' and datetime < '190301010000'
  union all
select * from temperatures_1903
where datetime >= '190301010000' and datetime < '190401010000'
  union all
select * from temperatures_1904
where datetime >= '190401010000' and datetime < '190501010000';

When we specify only a portion of the temperatures_v view, the query plan confirms that the unneeded branches of the view are filtered out by the query optimizer.

select /*+ gather_plan_statistics */ * from
(
  select
    month,
    station_code,
    dense_rank() over (partition by month order by average) as rank
  from
  (
    select
      substr(datetime,1,4)||'/'||substr(datetime,5,2) as month,
      station_code,
      avg(temperature) as average
    from temperatures_v
    where datetime >= '190101010000' and datetime < '190201010000'
    group by
      substr(datetime,1,4)||'/'||substr(datetime,5,2),
      station_code
  )
)
pivot(max(station_code) for rank in (1, 2, 3))
order by month;

Plan hash value: 2790062116

--------------------------------------------------------------------------------------------------------
| Id  | Operation                         | Name              | Starts | A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                  |                   |      1 |     12 |00:00:00.11 |      64 |
|   1 |  SORT GROUP BY PIVOT              |                   |      1 |     12 |00:00:00.11 |      64 |
|   2 |   VIEW                            |                   |      1 |     72 |00:00:00.11 |      64 |
|   3 |    WINDOW SORT                    |                   |      1 |     72 |00:00:00.11 |      64 |
|   4 |     HASH GROUP BY                 |                   |      1 |     72 |00:00:00.11 |      64 |
|   5 |      VIEW                         | TEMPERATURES_V    |      1 |   6565 |00:00:01.43 |      64 |
|   6 |       UNION-ALL                   |                   |      1 |   6565 |00:00:00.48 |      64 |
|*  7 |        EXTERNAL TABLE ACCESS FULL | TEMPERATURES_1901 |      1 |   6565 |00:00:00.34 |      64 |
|*  8 |        FILTER                     |                   |      1 |      0 |00:00:00.01 |       0 |
|*  9 |         EXTERNAL TABLE ACCESS FULL| TEMPERATURES_1902 |      0 |      0 |00:00:00.01 |       0 |
|* 10 |        FILTER                     |                   |      1 |      0 |00:00:00.01 |       0 |
|* 11 |         EXTERNAL TABLE ACCESS FULL| TEMPERATURES_1903 |      0 |      0 |00:00:00.01 |       0 |
|* 12 |        FILTER                     |                   |      1 |      0 |00:00:00.01 |       0 |
|* 13 |         EXTERNAL TABLE ACCESS FULL| TEMPERATURES_1904 |      0 |      0 |00:00:00.01 |       0 |
--------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   7 - filter(("DATETIME">='190101010000' AND "DATETIME"<'190201010000'))
   8 - filter(NULL IS NOT NULL)
   9 - filter(("DATETIME">='190201010000' AND "DATETIME"<'190201010000'))
  10 - filter(NULL IS NOT NULL)
  11 - filter(("DATETIME">='190301010000' AND "DATETIME"<'190201010000'))
  12 - filter(NULL IS NOT NULL)
  13 - filter(("DATETIME">='190401010000' AND "DATETIME"<'190201010000'))

Finally, let’s check whether query execution can be parallelized. And so it can. Notice the PX SELECTOR row sources in the query execution plan. This is a new feature of Oracle Database 12c. Oracle Database 12c is capable of executing UNION ALL branches in parallel. (See Concurrent Execution of Union All.)

alter table temperatures_1901 parallel 2;

select /*+ gather_plan_statistics */ * from
(
  select
    month,
    station_code,
    dense_rank() over (partition by month order by average) as rank
  from
  (
    select
      substr(datetime,1,4)||'/'||substr(datetime,5,2) as month,
      station_code,
      avg(temperature) as average
    from temperatures_v
    where datetime >= '190101010000' and datetime < '190501010000'
    group by
      substr(datetime,1,4)||'/'||substr(datetime,5,2),
      station_code
  )
)
pivot(max(station_code) for rank in (1, 2, 3))
order by month;

Plan hash value: 3783481314

-----------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                  | Name              | Starts |    TQ  |IN-OUT| PQ Distrib  | A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                           |                   |      1 |        |      |             |     48 |00:00:00.50 |      64 |
|   1 |  PX COORDINATOR                            |                   |      1 |        |      |             |     48 |00:00:00.50 |      64 |
|   2 |   PX SEND QC (ORDER)                       | :TQ10002          |      0 |  Q1,02 | P->S | QC (ORDER)  |      0 |00:00:00.01 |       0 |
|   3 |    SORT GROUP BY                           |                   |      2 |  Q1,02 | PCWP |             |     48 |00:00:00.01 |       0 |
|   4 |     PX RECEIVE                             |                   |      2 |  Q1,02 | PCWP |             |     48 |00:00:00.01 |       0 |
|   5 |      PX SEND RANGE                         | :TQ10001          |      0 |  Q1,01 | P->P | RANGE       |      0 |00:00:00.01 |       0 |
|   6 |       HASH GROUP BY PIVOT                  |                   |      2 |  Q1,01 | PCWP |             |     48 |00:00:00.02 |       0 |
|   7 |        VIEW                                |                   |      2 |  Q1,01 | PCWP |             |    288 |00:00:00.02 |       0 |
|   8 |         WINDOW SORT                        |                   |      2 |  Q1,01 | PCWP |             |    288 |00:00:00.02 |       0 |
|   9 |          HASH GROUP BY                     |                   |      2 |  Q1,01 | PCWP |             |    288 |00:00:00.02 |       0 |
|  10 |           PX RECEIVE                       |                   |      2 |  Q1,01 | PCWP |             |    288 |00:00:00.01 |       0 |
|  11 |            PX SEND HASH                    | :TQ10000          |      0 |  Q1,00 | P->P | HASH        |      0 |00:00:00.01 |       0 |
|  12 |             HASH GROUP BY                  |                   |      2 |  Q1,00 | PCWP |             |    288 |00:00:00.92 |     368 |
|  13 |              VIEW                          | TEMPERATURES_V    |      2 |  Q1,00 | PCWP |             |  26266 |00:00:05.53 |     368 |
|  14 |               UNION-ALL                    |                   |      2 |  Q1,00 | PCWP |             |  26266 |00:00:02.79 |     368 |
|  15 |                PX BLOCK ITERATOR           |                   |      2 |  Q1,00 | PCWC |             |   6565 |00:00:01.12 |      90 |
|* 16 |                 EXTERNAL TABLE ACCESS FULL | TEMPERATURES_1901 |      1 |  Q1,00 | PCWP |             |   6565 |00:00:00.98 |      90 |
|  17 |                PX SELECTOR                 |                   |      2 |  Q1,00 | PCWP |             |   6565 |00:00:01.02 |      90 |
|* 18 |                 EXTERNAL TABLE ACCESS FULL | TEMPERATURES_1902 |      2 |  Q1,00 | PCWP |             |   6565 |00:00:00.47 |      90 |
|  19 |                PX SELECTOR                 |                   |      2 |  Q1,00 | PCWP |             |   6554 |00:00:00.49 |      85 |
|* 20 |                 EXTERNAL TABLE ACCESS FULL | TEMPERATURES_1903 |      2 |  Q1,00 | PCWP |             |   6554 |00:00:00.44 |      85 |
|  21 |                PX SELECTOR                 |                   |      2 |  Q1,00 | PCWP |             |   6582 |00:00:00.57 |      85 |
|* 22 |                 EXTERNAL TABLE ACCESS FULL | TEMPERATURES_1904 |      2 |  Q1,00 | PCWP |             |   6582 |00:00:00.52 |      85 |
-----------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

  16 - filter(("DATETIME">='190101010000' AND "DATETIME"<'190201010000'))
  18 - filter(("DATETIME">='190201010000' AND "DATETIME"<'190301010000'))
  20 - filter(("DATETIME">='190301010000' AND "DATETIME"<'190401010000'))
  22 - filter(("DATETIME">='190401010000' AND "DATETIME"<'190501010000'))

I predict that the time is soon coming when the marriage of relational theory and transactional database management systems will be dissolved. We will be free to store structured non-transactional data outside a transactional database management system while continuing to exploit the entire universe of indexing, partitioning, and clustering techniques as well as the full power of relational languages, not only SQL.

Over to you, Mogens.

The time is soon coming when the marriage of relational theory and transactional database management systems will be dissolved. We will be free to store structured non-transactional data outside a transactional database management system while continuing to exploit the entire universe of indexing, partitioning, and clustering techniques as well as the full power of relational languages, not only SQL.

Update (August 11, 2014): The code demonstration has been updated to use Oracle Database 12.1.0.1. Concurrent execution of UNION ALL is a new feature of Oracle Database 12.1.0.1.

Categories: Big Data, DBA, Hadoop, Humor, NoSQL, Oracle, SQL

Dilbert likes Hadoop clusters

January 21, 2013 1 comment

Here are some old Dilbert cartoons with the references to SQL databases—the hot technology of the 1990s—replaced with references to Hadoop clusters—today’s hot technology. On a more serious note, I recommend the clear explanation of Map-Reduce methodology in Chapter 2 of Prof. Ullman’s book Mining of Massive Datasets which is available for free download from his website. He shows how common relational algebra operations such as Union, Difference, and Intersection can be implemented using Map-Reduce. It therefore surprises me that these operations have not yet been implemented in Hive Query Language (HiveQL) and Pig, poor cousins of SQL found in Hadoop environments.

Feb 27, 1996 (http://www.dilbert.com/fast/1996-02-27/)

Pointy-Haired Boss: I just got our consultant’s report. He’s identified our biggest problem.
Wally: I recommend that we build a tracking database Hadoop cluster.
Dilbert: We can put it on the network.
Pointy-Haired Boss: Would you like to hear what the problem is first?
Wally: I hate to dwell on the negative.
Dilbert: We like databases Hadoop clusters.

Feb 28, 1996 (http://www.dilbert.com/fast/1996-02-28/)

Pointy-Haired Boss: You haven’t heard what the problem is yet; how can you recommend building a database Hadoop cluster to solve it??
Wally: We always build a database Hadoop cluster.
Dilbert: And we’ll need coffee mugs for the project team.
Pointy-Haired Boss: The problem is that we have poor processes.
Wally: That could be the slogan on our mugs!

Dec 16, 1996 (http://www.dilbert.com/fast/1996-12-16/)

Elbonian salesman: Our Elbonian database product Hadoop cluster can replace every one of your current systems.
Dilbert: No thanks.
Elbonian salesman: It can do payroll, accounts receivable, inventory, sales …
Alice: No thanks.
Elbonian salesman: And I’ll throw in some golf balls.
Pointy-Haired Boss: It’s a deal! Just toss them in the lake with all my other ones.

Dec 17, 1996 (http://www.dilbert.com/fast/1996-12-17/)

Pointy-Haired Boss: We’re going to replace our computer support systems with the Elbonian database product Hadoop cluster.
Pointy-Haired Boss: It’s risky, but don’t worry. I’ve hired an outrageously expensive consultant who has never done this before.
Consultant: I earned five hundred dollars this morning just coming to this meeting. How’s your day going?
Wally: It won’t make my top ten.

Categories: Big Data, Hadoop, Humor
%d bloggers like this: