Home > Big Data, DBA, Hadoop, Humor, NoSQL, Oracle, SQL > We don’t use databases; we don’t use indexes

We don’t use databases; we don’t use indexes


Whenever salespeople phone Mogens Norgaard, he puts them off by saying that he just doesn’t use the products that they are calling about.

When the office furniture company phones, he says “We don’t use office furniture.” When the newspaper company phones, he says “We don’t read newspapers.” When the girl scouts phone, he probably says “We don’t eat cookies.”

Once he got a phone call from the phone company.

You can only imagine how that conversation went. Read the whole story at http://wedonotuse.com/stories-and-answers.aspx.

I wonder what Mogens would say if a database vendor phoned. I can imagine him saying “We don’t use databases. We don’t use indexes. We store all our data in compressed text files. Each compressed text file contains one year of data for one location. There is a separate subdirectory for each year. We have a terabyte of data going back to 1901 so we currently have 113 subdirectories. The performance is just fine, thank you.”

On second thoughts, that’s just too far-fetched.

You see, back in the early days of the relational era, the creator of relational theory, Dr. Edward Codd married relational theory with transactional database management systems (a.k.a. ACID DBMS) and the Relational Database Management System (RDBMS) was born. He authored two influential ComputerWorld articles—“Is your DBMS really relational?” (October 14, 1985) and “Does your DBMS run by the rules?” (October 21, 1985)—that set the direction of the relational movement for the next quarter century. Today, the full declarative power of “data base sublanguages” (the term coined by Dr. Codd) such as Structured Query Language (SQL) is only available within the confines of a transactional database management system.

Today, the full declarative power of “data base sublanguages” (the term coined by Dr. Codd) such as Structured Query Language (SQL) is only available within the confines of a transactional database management system.

But it shoudn’t have to be that way.

Consider the running example of “big data” used in Hadoop: The Definitive Guide. The National Climatic Data Center publishes hourly climatic data such as temperature and pressure from more than 10,000 recording stations all over the world. Data from 1901 onwards is available in text files. Each line of text contains the station code, the timestamp, and a number of climatic readings. The format is documented at ftp://ftp.ncdc.noaa.gov/pub/data/noaa/ish-format-document.pdf. The files are organized into subdirectories, one subdirectory for each year. Each subdirectory contains one file from each recording station that was in operation during that year. The individual files are compressed using gzip. All the files can be downloaded from ftp://ftp.ncdc.noaa.gov/pub/data/noaa/.

You might have already guessed where I am going with this.

Conceptually the above terabyte-sized data set is a single table. But it should not be necessary to uncompress and load this huge quantity of structured non-transactional data into a transactional database management system in order to query it. The choice of physical representation conserves storage space and is a technical detail that is irrelevant to the logical presentation of the data set as a single table; it is a technical detail that users don’t care about. As Dr. Codd said in the opening sentence of his 1970 paper A Relational Model of Data For Large Shared Data Banks (faithfully reproduced in the 100th issue of the NoCOUG Journal), “future users of large data banks must be protected from having to know how the data is organized in the machine (the internal representation).”

Users of large data banks must be protected from having to know how the data is organized in the machine (the internal representation).

Why shouldn’t we be able to query the above data set using good old SQL?

Well you can do just that with the Oracle query engine and you don’t have to load it into an Oracle database first. You can even take advantage of partitioning and parallelism. You can also write queries that mix and match data from the database and the filesystem.

The following demonstrations were performed using a pre-Built developer VM for Oracle VM VirtualBox. The version of Oracle Database is 12.1.0.1.

SQL*Plus: Release 12.1.0.1.0 Production on Fri Aug 16 1

0:45:04 2013

Copyright (c) 1982, 2013, Oracle.  All rights reserved.

In the demonstrations, we only consider the years from 1901 to 1904. Here is the directory structure.

/home/oracle/app/oracle/admin/orcl/dpdump/noaa
/home/oracle/app/oracle/admin/orcl/dpdump/noaa/1901
/home/oracle/app/oracle/admin/orcl/dpdump/noaa/1902
/home/oracle/app/oracle/admin/orcl/dpdump/noaa/1903
/home/oracle/app/oracle/admin/orcl/dpdump/noaa/1904

We first need to create “directories” and define an “external table.” The definition of this external table specifies a preprocessing script which is the secret sauce that makes it possible for the query engine to traverse the subdirectories and uncompress the data.

connect / as sysdba

alter session set container=pdborcl;

create or replace directory share_dir
  as '/u01/app/oracle/admin/orcl/share';

create or replace directory noaa_dir
  as '/u01/app/oracle/admin/orcl/share/noaa';

create or replace directory noaa_1901_dir
  as '/u01/app/oracle/admin/orcl/share/noaa_1901'; 

create or replace directory noaa_1902_dir
  as '/u01/app/oracle/admin/orcl/share/noaa_1902'; 

create or replace directory noaa_1903_dir
  as '/u01/app/oracle/admin/orcl/share/noaa_1903'; 

create or replace directory noaa_1904_dir
  as '/u01/app/oracle/admin/orcl/share/noaa_1904'; 

grant all on directory share_dir to public;
grant all on directory noaa_dir to public;
grant all on directory noaa_1901_dir to public;
grant all on directory noaa_1902_dir to public;
grant all on directory noaa_1903_dir to public;
grant all on directory noaa_1904_dir to public;

connect iggy/iggy@pdborcl

drop table temperatures;
create table temperatures
(
  station_code char(6),
  datetime char(12),
  temperature char(5)
)
organization external
(
  type oracle_loader
  default directory share_dir
  access parameters
  (
    records delimited by newline
    preprocessor share_dir:'uncompress.sh'
    fields
    (
      station_code position(1:6) char(4),
      datetime position(7:18) char(12),
      temperature position(19:23) char(5)
    )
  )
  location ('noaa')
);

Here’s the tiny preprocessing script that makes it possible for Oracle to traverse the subdirectories and uncompress the data. It recursively traverses the file system beginning with the location specified by the query engine; that is, the location specified in the table definition. It uncompresses all zipped files it finds and sends the output to the “cut” utility which cuts out only those column positions that we care about and writes what’s left to standard output, not to the filesystem. The table definition specifies its location as data_pump_dir.

#!/bin/sh
/usr/bin/find $1 -name "*.gz" -exec /bin/zcat {} \; | /usr/bin/cut -c5-10,16-27,88-92

All the capabilities of SQL—including analytic functions and pivoting—can now be exploited as shown in the following example. For each month in the year 1901, we list the top three recording stations in terms of average monthly temperature.

set pagesize 66
select /*+ gather_plan_statistics */ * from
(
  select
    month,
    station_code,
    dense_rank() over (partition by month order by average) as rank
  from
  (
    select
      substr(datetime,1,4)||'/'||substr(datetime,5,2) as month,
      station_code,
      avg(temperature) as average
    from temperatures
    where datetime >= '1901' and datetime < '1902'
    group by
      substr(datetime,1,4)||'/'||substr(datetime,5,2),
      station_code
  )
)
pivot(max(station_code) for rank in (1, 2, 3))
order by month;

MONTH                     1      2      3
------------------------- ------ ------ ------
1901/01                   2270   0296   0297
1901/02                   2270   0290   0296
1901/03                   2270   0290   0296
1901/04                   0290   0295   0298
1901/05                   0290   0295   0298
1901/06                   0290   0298   0295
1901/07                   0290   0295   2270
1901/08                   2270   0290   0296
1901/09                   0290   2270   0296
1901/10                   2270   0296   0290
1901/11                   2270   0296   0297
1901/12                   2270   0296   0290

It’s an epiphany, that’s what it is.

We can also use “partition views” and take advantage of “partition pruning.” For those who don’t remember, partition views are a really old feature that predates “real” partitioning in Oracle 8.0 and above. Partition views continue to work just fine today, even in Oracle Database 12c.

Let’s create a separate table definition for each year and then use a partition view to tie the tables together.

create table temperatures_1901
(
  station_code char(6),
  datetime char(12),
  temperature char(5)
)
organization external
(
  type oracle_loader
  default directory noaa_dir
  access parameters
  (
    records delimited by newline
    preprocessor share_dir:'uncompress.sh'
    fields
    (
      station_code position(1:6) char(4),
      datetime position(7:18) char(12),
      temperature position(19:23) char(5)
    )
  )
  location ('1901')
);

-- the remaining table definitions are not shown for brevity

create or replace view temperatures_v as
select * from temperatures_1901
where datetime >= '190101010000' and datetime < '190201010000'
  union all
select * from temperatures_1902
where datetime >= '190201010000' and datetime < '190301010000'
  union all
select * from temperatures_1903
where datetime >= '190301010000' and datetime < '190401010000'
  union all
select * from temperatures_1904
where datetime >= '190401010000' and datetime < '190501010000';

When we specify only a portion of the temperatures_v view, the query plan confirms that the unneeded branches of the view are filtered out by the query optimizer.

select /*+ gather_plan_statistics */ * from
(
  select
    month,
    station_code,
    dense_rank() over (partition by month order by average) as rank
  from
  (
    select
      substr(datetime,1,4)||'/'||substr(datetime,5,2) as month,
      station_code,
      avg(temperature) as average
    from temperatures_v
    where datetime >= '190101010000' and datetime < '190201010000'
    group by
      substr(datetime,1,4)||'/'||substr(datetime,5,2),
      station_code
  )
)
pivot(max(station_code) for rank in (1, 2, 3))
order by month;

Plan hash value: 2790062116

--------------------------------------------------------------------------------------------------------
| Id  | Operation                         | Name              | Starts | A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                  |                   |      1 |     12 |00:00:00.11 |      64 |
|   1 |  SORT GROUP BY PIVOT              |                   |      1 |     12 |00:00:00.11 |      64 |
|   2 |   VIEW                            |                   |      1 |     72 |00:00:00.11 |      64 |
|   3 |    WINDOW SORT                    |                   |      1 |     72 |00:00:00.11 |      64 |
|   4 |     HASH GROUP BY                 |                   |      1 |     72 |00:00:00.11 |      64 |
|   5 |      VIEW                         | TEMPERATURES_V    |      1 |   6565 |00:00:01.43 |      64 |
|   6 |       UNION-ALL                   |                   |      1 |   6565 |00:00:00.48 |      64 |
|*  7 |        EXTERNAL TABLE ACCESS FULL | TEMPERATURES_1901 |      1 |   6565 |00:00:00.34 |      64 |
|*  8 |        FILTER                     |                   |      1 |      0 |00:00:00.01 |       0 |
|*  9 |         EXTERNAL TABLE ACCESS FULL| TEMPERATURES_1902 |      0 |      0 |00:00:00.01 |       0 |
|* 10 |        FILTER                     |                   |      1 |      0 |00:00:00.01 |       0 |
|* 11 |         EXTERNAL TABLE ACCESS FULL| TEMPERATURES_1903 |      0 |      0 |00:00:00.01 |       0 |
|* 12 |        FILTER                     |                   |      1 |      0 |00:00:00.01 |       0 |
|* 13 |         EXTERNAL TABLE ACCESS FULL| TEMPERATURES_1904 |      0 |      0 |00:00:00.01 |       0 |
--------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   7 - filter(("DATETIME">='190101010000' AND "DATETIME"<'190201010000'))
   8 - filter(NULL IS NOT NULL)
   9 - filter(("DATETIME">='190201010000' AND "DATETIME"<'190201010000'))
  10 - filter(NULL IS NOT NULL)
  11 - filter(("DATETIME">='190301010000' AND "DATETIME"<'190201010000'))
  12 - filter(NULL IS NOT NULL)
  13 - filter(("DATETIME">='190401010000' AND "DATETIME"<'190201010000'))

Finally, let’s check whether query execution can be parallelized. And so it can. Notice the PX SELECTOR row sources in the query execution plan. This is a new feature of Oracle Database 12c. Oracle Database 12c is capable of executing UNION ALL branches in parallel. (See Concurrent Execution of Union All.)

alter table temperatures_1901 parallel 2;

select /*+ gather_plan_statistics */ * from
(
  select
    month,
    station_code,
    dense_rank() over (partition by month order by average) as rank
  from
  (
    select
      substr(datetime,1,4)||'/'||substr(datetime,5,2) as month,
      station_code,
      avg(temperature) as average
    from temperatures_v
    where datetime >= '190101010000' and datetime < '190501010000'
    group by
      substr(datetime,1,4)||'/'||substr(datetime,5,2),
      station_code
  )
)
pivot(max(station_code) for rank in (1, 2, 3))
order by month;

Plan hash value: 3783481314

-----------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                  | Name              | Starts |    TQ  |IN-OUT| PQ Distrib  | A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                           |                   |      1 |        |      |             |     48 |00:00:00.50 |      64 |
|   1 |  PX COORDINATOR                            |                   |      1 |        |      |             |     48 |00:00:00.50 |      64 |
|   2 |   PX SEND QC (ORDER)                       | :TQ10002          |      0 |  Q1,02 | P->S | QC (ORDER)  |      0 |00:00:00.01 |       0 |
|   3 |    SORT GROUP BY                           |                   |      2 |  Q1,02 | PCWP |             |     48 |00:00:00.01 |       0 |
|   4 |     PX RECEIVE                             |                   |      2 |  Q1,02 | PCWP |             |     48 |00:00:00.01 |       0 |
|   5 |      PX SEND RANGE                         | :TQ10001          |      0 |  Q1,01 | P->P | RANGE       |      0 |00:00:00.01 |       0 |
|   6 |       HASH GROUP BY PIVOT                  |                   |      2 |  Q1,01 | PCWP |             |     48 |00:00:00.02 |       0 |
|   7 |        VIEW                                |                   |      2 |  Q1,01 | PCWP |             |    288 |00:00:00.02 |       0 |
|   8 |         WINDOW SORT                        |                   |      2 |  Q1,01 | PCWP |             |    288 |00:00:00.02 |       0 |
|   9 |          HASH GROUP BY                     |                   |      2 |  Q1,01 | PCWP |             |    288 |00:00:00.02 |       0 |
|  10 |           PX RECEIVE                       |                   |      2 |  Q1,01 | PCWP |             |    288 |00:00:00.01 |       0 |
|  11 |            PX SEND HASH                    | :TQ10000          |      0 |  Q1,00 | P->P | HASH        |      0 |00:00:00.01 |       0 |
|  12 |             HASH GROUP BY                  |                   |      2 |  Q1,00 | PCWP |             |    288 |00:00:00.92 |     368 |
|  13 |              VIEW                          | TEMPERATURES_V    |      2 |  Q1,00 | PCWP |             |  26266 |00:00:05.53 |     368 |
|  14 |               UNION-ALL                    |                   |      2 |  Q1,00 | PCWP |             |  26266 |00:00:02.79 |     368 |
|  15 |                PX BLOCK ITERATOR           |                   |      2 |  Q1,00 | PCWC |             |   6565 |00:00:01.12 |      90 |
|* 16 |                 EXTERNAL TABLE ACCESS FULL | TEMPERATURES_1901 |      1 |  Q1,00 | PCWP |             |   6565 |00:00:00.98 |      90 |
|  17 |                PX SELECTOR                 |                   |      2 |  Q1,00 | PCWP |             |   6565 |00:00:01.02 |      90 |
|* 18 |                 EXTERNAL TABLE ACCESS FULL | TEMPERATURES_1902 |      2 |  Q1,00 | PCWP |             |   6565 |00:00:00.47 |      90 |
|  19 |                PX SELECTOR                 |                   |      2 |  Q1,00 | PCWP |             |   6554 |00:00:00.49 |      85 |
|* 20 |                 EXTERNAL TABLE ACCESS FULL | TEMPERATURES_1903 |      2 |  Q1,00 | PCWP |             |   6554 |00:00:00.44 |      85 |
|  21 |                PX SELECTOR                 |                   |      2 |  Q1,00 | PCWP |             |   6582 |00:00:00.57 |      85 |
|* 22 |                 EXTERNAL TABLE ACCESS FULL | TEMPERATURES_1904 |      2 |  Q1,00 | PCWP |             |   6582 |00:00:00.52 |      85 |
-----------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

  16 - filter(("DATETIME">='190101010000' AND "DATETIME"<'190201010000'))
  18 - filter(("DATETIME">='190201010000' AND "DATETIME"<'190301010000'))
  20 - filter(("DATETIME">='190301010000' AND "DATETIME"<'190401010000'))
  22 - filter(("DATETIME">='190401010000' AND "DATETIME"<'190501010000'))

I predict that the time is soon coming when the marriage of relational theory and transactional database management systems will be dissolved. We will be free to store structured non-transactional data outside a transactional database management system while continuing to exploit the entire universe of indexing, partitioning, and clustering techniques as well as the full power of relational languages, not only SQL.

Over to you, Mogens.

The time is soon coming when the marriage of relational theory and transactional database management systems will be dissolved. We will be free to store structured non-transactional data outside a transactional database management system while continuing to exploit the entire universe of indexing, partitioning, and clustering techniques as well as the full power of relational languages, not only SQL.

Update (August 11, 2014): The code demonstration has been updated to use Oracle Database 12.1.0.1. Concurrent execution of UNION ALL is a new feature of Oracle Database 12.1.0.1.

About these ads
Categories: Big Data, DBA, Hadoop, Humor, NoSQL, Oracle, SQL
  1. January 23, 2013 at 6:13 am

    Nice article Iggy.

  2. January 23, 2013 at 10:52 am

    Very nice article, Iggy. If one has to pluck one and only one theme from this blog, it would be “right tool for the job”. More often than imaginable, we choose the wrong approach or the wrong tool and get hung up in our own handicap of a “defined” or pure system. Relational databases are based on relational theory proposed by Codd several years ago; but with the passage of time, it has to be understood that the theory needs adjustments and tweaking. When it comes to storage, again, there is no reason to cling on to the concept that the user should not care about the storage system. The operative word is “should not”. What if the user does want to care? As your example illustrates, the use of the knowledge of the underlying organization could be culled to accomplish higher goals.

    External tables (and many other features such as bfiles and LOBs) in Oracle are testament to that concept of right tools; not right theories. Several years ago I wrote a tool to read the listener log files as external table and blogged about it later: http://arup.blogspot.com/2010/05/mining-listener-logs.html. That used external tables to expose listener logs to be queried by SQL – right tool for the job.

    So, IMHO, expertise is to pull out the right approaches rather than creating an artificial barrier of not using some theory in its purest form, as your example illustrates so vividly. the root cause is not the “marriage” between relational and transactional systems; but abandoning the idea that such a marriage even exists.

    • jgarry
      January 23, 2013 at 2:43 pm

      I don’t think it is a testament to tools over theories; there is a continuum. If there is an apparent lack of theory, who can judge if it is the practitioner or theoretical limitations? Surely there are people who’s heads are now exploding at the thought of a product that doesn’t strictly follow relational theory. The examples of new paradigms not being transactionally correct are legion. So, yes, you need to use the right tool for the job, but people don’t – they are usually happy to get anything working at all. Once you get to something that needs transactions, you need proper mathematical support. Proof of that tends to be lacking in popular non-relational systems.

      As far as caring about underlying storage, physical design can feed back into logical design, but as we all should know, premature optimization is the root of all unhappy marriages.

      It is totally cool that we can do this subdirectory thing; but is it something that will last for the ages? No. External files for parsing alert log, a great idea. Now we have XML alert logs. Why use something designed for markup to log messages? To make it easy to ship log data elsewhere? To make more money for storage vendors? To turn error logging into an app? To give bored DBA’s something to maintain? XML is “human-legible and reasonably clear.” Now my head is going to explode.

      • Iggy Fernandez
        January 23, 2013 at 3:41 pm

        jgarry :

        It is totally cool that we can do this subdirectory thing; but is it something that will last for the ages? No.

        This amusing demonstration only served to prove the point that we don’t need a transactional database management system in order to exploit partitioning, parallelism, and Structured Query Language. The NoSQL movement is forcing the issue and the time will soon come when we will no longer be forced to store structured but non-transactional data in transactional database management systems in order to exploit the universe of indexing, partitioning, and clustering technologies and the full declarative power of relational languages (not just SQL). ETL and BI are the perfect candidates in waiting.

  3. John Beresniewicz
    January 23, 2013 at 11:35 am

    very nice post Iggy.
    JB

  4. January 23, 2013 at 12:39 pm

    Enjoyed this one Iggy – thanks!
    – Kyle Hailey

  5. Iggy Fernandez
    January 23, 2013 at 4:54 pm

    Arup Nanda :

    When it comes to storage, again, there is no reason to cling on to the concept that the user should not care about the storage system. The operative word is “should not”. What if the user does want to care? As your example illustrates, the use of the knowledge of the underlying organization could be culled to accomplish higher goals.

    I should have been more precise. I meant end users such as accountants, engineers, and architects, not power users such as programmers and database administrators. End users don’t care about the internal representation but power users may be able to turn the details to their advantage.

  6. January 23, 2013 at 5:08 pm

    jgarry:

    “External files for parsing alert log, a great idea. Now we have XML alert logs. Why use something designed for markup to log messages?”

    I mentioned external tables for listener logs; not alert logs. And, listener logs are also in XML format now. Why make them external tables? the answer is simple – to read them in SQL. Their representation in XML may be to make them parsable by an XML parser; but not SQL. As to why query them in SQL? Please check my blog. Tons of information is available in those listener logs.

  7. January 24, 2013 at 12:53 pm

    Very nice, Iggy. I’m surprised no one has yet looked into some of the Unix commands that have been around for eons. I use them quite a lot and it’s fun to do the mental gymnastics of mapping them out to actual SQL.
    – cut – projection/select
    – join – exactly that
    – comm – union, sort of
    – uniq – distinct (oh no! even here!!!)
    – sort – order by
    – paste – ||
    and so on.
    Look them up in the man pages, they are quite fun to use!

  8. Iggy Fernandez
    January 24, 2013 at 3:27 pm

    Noons :

    Very nice, Iggy. I’m surprised no one has yet looked into some of the Unix commands that have been around for eons.

    Thanks, Nuno. I’ll write a blog post on the minimal set of relational capabilities required to make a relational language “relationally complete.” I’ve noticed that the Hive and Pig camps have not yet implemented a minimally complete set of capabilities so a theory session seems to be in order.

    To illustrate the point, I’ll use the Unix commands that you mentioned.

  9. January 27, 2013 at 9:57 pm

    I am not sure the subdirectory structure is that far fetched, given what passes today for a DBMS.

    I don’t think it is correct to say that Codd “married relational theory to transactional database systems”. What he married was first order predicate logic to the equivalent theory of relations (sets) which,, with a few slight adjustments he applied to database management ****in general***. The theory can be used whenever the informational need is logical inferencing–which is what querying is all about. For example, warehouses are nothing but application-specific views of the transactional database, are not transactional and, had they been designed as such, could have been handled relationally.

    As to physical data independence, I still remember the complaint that the relational theory “disregards physical storage and access methods, which makes it impractical”.

    I very much doubt that it is possible for practitioners to understand and appreciate the value and importance of a theoretical foundation without some intellectual development by the education system. Training with products alone does not do it.

    Best regards,
    Fabian Pascal
    http://www.dbdebunk.com

    http://en.wikipedia.org/wiki/Fabian_Pascal

    http://www.linkedin.com/profile/edit?trk=tab_pro

    http://dbdebunk.blogspot.com/p/about_4.html

    http://dbdebunk.blogspot.com/p/papers_3.html

    http://dbdebunk.blogspot.com/p/seminars.html

  10. philip
    February 2, 2013 at 9:25 pm

    Let me get this straight. It is better *not* to store the data in some standard relational declaratively integrity-constrained format that a relational tool lets you immediately trivially relationally query because all you have to do is first hand-write and debug a procedural multilanguage multipage “tiny preprocessing script” depending on a particular ad-hoc text format and file structure?

    • Iggy Fernandez
      February 3, 2013 at 2:58 pm

      philip :

      Let me get this straight. It is better *not* to store the data in some standard relational declaratively integrity-constrained format that a relational tool lets you immediately trivially relationally query because all you have to do is first hand-write and debug a procedural multilanguage multipage “tiny preprocessing script” depending on a particular ad-hoc text format and file structure?

      Conceptually the above terabyte-sized data set is a single table. But it should not be necessary to uncompress and load this huge quantity of structured non-transactional data into a transactional database management system in order to query it. The choice of physical representation conserves storage space but is a technical detail that is irrelevant to the logical presentation of the data set as a single table; it is a technical detail that users don’t care about. As Dr. Codd said in the opening sentence of his 1970 paper A Relational Model of Data For Large Shared Data Banks (faithfully reproduced in the 100th issue of the NoCOUG Journal), “future users of large data banks must be protected from having to know how the data is organized in the machine (the internal representation).” Today however, the full declarative power of “data base sublanguages” (the term coined by Dr. Codd) such as Structured Query Language (SQL) is only available within the confines of a transactional database management system.The NoSQL movement is forcing the issue and the time will soon come when we will no longer be forced to store structured non-transactional data in transactional database management systems in order to exploit the entire universe of indexing, partitioning, and clustering technologies and the full declarative power of relational languages (not just SQL). ETL and BI are the perfect candidates in waiting.

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 771 other followers

%d bloggers like this: