Archive

Archive for the ‘NoCOUG’ Category

No! to SQL and No! to NoSQL

July 28, 2013 5 comments

See also :  What’s so sacred about relational anyway?

C J DateThe opinions of relational theoreticians C. J. Date and Hugh Darwen about SQL are well known but they have not previously commented on the NoSQL phenomenon and so their interview in the latest issue of the NoCOUG Journal—the official publication of the Northern California Oracle Users group—makes for interesting reading. In my opinion, their comments are right on the mark even though Date admits that he knows almost nothing about NoSQL products.

In discussing Amazon Dynamo—the forerunner of the NoSQL movement—and those that came after it, Date and Darwen make these astute observations:

“Developers tend to be more concerned with convenience in database definition and updating than with the ease of deriving useful and reliable information from the database.”—Darwen

 “If there’s a suggestion that Amazon’s various disaster scenarios, regarding tornados and the rest, are somehow more of a problem for relational systems than they are for nonrelational ones, then of course I reject that suggestion 100 percent.”—Date

“Those who disparage relational are almost invariably very far from being properly informed and almost invariably equate ‘relational’ with [current implementations].”—Darwen

Dynamo Assumptions and Requirements

Date and Darwen’s remarks are spot on. Here is the Dynamo use case from the 2007 ACM paper by Amazon: “Customers should be able to view and add items to their shopping cart even if disks are failing, network routes are flapping, or data centers are being destroyed by tornados. Therefore, the service responsible for managing shopping carts requires that it can always write to and read from its data store, and that its data needs to be available across multiple data centers. … There are many services on Amazon’s platform that only need primary-key access to a data store. For many services, such as those that provide best seller lists, shopping carts, customer preferences, session management, sales rank, and product catalog, the common pattern of using a relational database would lead to inefficiencies and limit scale and availability. Dynamo provides a simple primary-key only interface to meet the requirements of these applications. … Experience at Amazon has shown that data stores that provide ACID guarantees tend to have poor availability. … Dynamo targets applications that operate with weaker consistency (the “C” in ACID) if this results in high availability.”

To paraphrase, Amazon’s goals were extreme performance, extreme scalability, and extreme availability and it concluded that the only way to achieve its ends was to discard the relational model.

Compartmentation

Amazon started off on the right track. Its first innovation was to break up the traditional monolithic enterprise-wide database service into simpler component services such as the best seller list service, the shopping cart service, the customer preferences service, the sales rank service, and the product catalog service. This avoids a single point of failure.

Amazon’s first DBA, Jeremiah Wilton, explained Amazon’s approach in his answer to the question “Is 24×7 a myth?” in an interview published in the NoCOUG Journal in 2007. He said “The best availability in the industry comes from application software that is predicated upon a surprising assumption: The databases upon which the software relies will inevitably fail. The better the software’s ability to continue operating in such a situation, the higher the overall service’s availability will be. But isn’t Oracle unbreakable? At the database level, regardless of the measures taken to improve availability, outages will occur from time to time. An outage may be from a required upgrade or a bug. Knowing this, if you engineer application software to handle this eventuality, then a database outage will have less or no impact on end users. In summary, here are many ways to improve a single database’s availability. But the highest availability comes from thoughtful engineering of the entire application architecture.”

As an example, the shopping cart service should not be affected if the checkout service is having hiccups.

There’s no conflict with the relational model there, even if the shopping cart data is not in the same database as the product catalog and occasional inconsistencies result. Occasionally, an item that is present in a shopping cart may go out of stock. Occasionally, an item that is present in a shopping cart may be repriced. The problems can be resolved when the customer decides to checkout if not earlier. As an Amazon customer, I occasionally leave items in my shopping cart but don’t complete a purchase. When I resume shopping, I sometimes get a notification that an item in my shopping chart is no longer in stock or has been repriced.

In any case, current relational implementations do not allow referential integrity constraints to span databases. See, for example, http://docs.oracle.com/cd/E11882_01/server.112/e25494/ds_appdev003.htm.

Extreme Scalability

Focusing next on extreme scalability, Amazon saw that the solution was “sharding” or horizontal partitioning of a hierarchical schema amongst shared-nothing database servers. The simple hierarchical schemas that resulted from compartmentation were very shardable. No conflict with the relational model there.

Extreme Availability

Continuing this line of thought, Amazon saw that the key to extreme availability was data replication. Multiple copies of the shopping cart are allowed to exist and, if one of the replicas becomes unresponsive, the data can be served by one of the other replicas. The technique used by Dynamo has a close parallel in the well-known technique of “multimaster replication.” However, because of network latencies, the copies may occasionally get out of sync and the customer may occasionally encounter a stale version of the shopping cart. Once again, this can be appropriately handled by the application tier; the node that falls behind can catch up eventually or inconsistencies can be detected and resolved at an opportune time such as checkout. This technique is called “eventual consistency.”

The inventor of relational theory, Dr. Edgar “Ted” Codd, himself blessed eventual consistency in the very first paper on relational theory “A Relational Model of Data for Large Shared Data Banks” which was reprinted with permission in the 100th issue of the NoCOUG Journal. In that paper, Dr. Codd said:

“There are, of course, several possible ways in which a system can detect inconsistencies and respond to them. In one approach the system checks for possible inconsistency whenever an insertion, deletion, or key update occurs. Naturally, such checking will slow these operations down. [emphasis added] If an inconsistency has been generated, details are logged internally, and if it is not remedied within some reasonable time interval, either the user or someone responsible for the security and integrity of the data is notified. Another approach is to conduct consistency checking as a batch operation once a day or less frequently.”

In other words, the inventor of relational theory did not see a conflict between his relational model and the “eventual consistency” that is one of the hallmarks of the NoSQL products of today. However, Amazon imagined a conflict because it quite understandably conflated the relational model with the ACID guarantees of database management systems. However, ACID has nothing to do with the relational model per se (though relational theory does come in very handy in defining consistency constraints); pre-relational database management systems such as IMS provided ACID guarantees and so did post-relational object-oriented database management systems.

The tradeoff between consistency and performance is as important in the wired world of today as it was in Codd’s world. We cannot cast stones at Dynamo for the infraction of not guaranteeing the synchronization of replicated data, because violations of the consistency requirement are commonplace in the relational camp. The replication technique used by Dynamo is similar to the well-known technique of “multimaster replication.” Application developers are warned about the negative impact of integrity constraints. [1] [2] [3] [4] And, most importantly, no DBMS that aspires to the relational moniker currently implements the SQL-92 “CREATE ASSERTION” feature that is necessary in order to provide the consistency guarantee. For a detailed analysis of this anomaly, refer to Toon Koppelaars’s article “CREATE ASSERTION: The Impossible Dream” in the August 2013 issue of the NoCOUG Journal.

Extreme Performance

The final hurdle was extreme performance and that’s where Amazon went astray. Amazon believed that the relational model makes data retrieval and updates inefficient. This objection to the relational model is colorfully summarized by the following statement attributed to Esther Dyson, the editor of the Release 1.0 newsletter, “Using tables to store objects is like driving your car home and then disassembling it to put it in the garage. It can be assembled again in the morning, but one eventually asks whether this is the most efficient way to park a car.” The statement dates back to 1988 and was much quoted when object-oriented databases were in vogue. [5]

Since the shopping cart is an “object,” then surely disassembling it for storage makes data retrieval and updates inefficient? The belief stems from an unfounded assumption that has found its way into every relational implementation to date—that a “table” should map to physical storage. In reality, the relational model does not concern itself with storage details at all. The relational model is a logical model, it does not concern itself with storage details. It would be perfectly legitimate to store the shopping cart in a physical form that resembled a shopping cart while still offering a relational model of the data complete with SQL. In other words, the physical representation could be optimized for the most important use case—retrieving the entire shopping-cart object using its key—without affecting the relational model of the data. It would also be perfectly legitimate to provide a non-relational API for the important use cases. Dr. Codd himself gave conditional blessing to such non-relational APIs in his 1985 Computerworld article “Is Your DBMS Really Relational?” where he says “If a relational system has a low-level (single-record-at-a-time) language, that low level [should not] be used to subvert or bypass the integrity rules and constraints expressed in the higher level relational language (multiple-records-at-a-time).”

Have I mentioned that Dr. Codd invented relational theory?

Zero-th Normal Form

In fact, the key-value approach used by Dynamo and those that came after it is exactly equivalent to “zero-th” normal form in relational terminology. [6] In his 1970 paper, Codd says: “Nonatomic values can be discussed within the relational framework. Thus, some domains may have relations as elements. These relations may, in turn, be defined on nonsimple domains, and so on. For example, one of the domains on which the relation employee is defined might be salary history. An element of the salary history domain is a binary relation defined on the domain date and the domain salary. The salary history domain is the set of all such binary relations. At any instant of time there are as many instances of the salary history relation in the data bank as there are employees. In contrast, there is only one instance of the employee relation.” In relational terminology, a relation with non-simple domains is said to be in “zero-th” normal form or unnormalized. Codd suggested that unnormalized relations should be normalized for ease of use. Here is the unnormalized employee relation from Codd’s paper.

employee (
  employee#,
  name,
  birthdate,
  jobhistory (jobdate, title, salaryhistory (salarydate, salary)),
  children (childname, birthyear)
)

The above unnormalized relation can be decomposed into four normalized relations as follows.

employee' (employee#, name, birthdate)
jobhistory' (employee#, jobdate, title)
salaryhistory' (employee#, jobdate, salarydate, salary)
children' (employee#, childname, birthyear)

However, this is not to suggest that these normalized relations must necessarily be mapped to individual buckets of physical storage. Codd differentiated between the stored set, the named set, and the expressible set. In the above example, we have five relations but, if we preferred it, the unnormalized employee relation could be the only member of the stored set. Alternatively, if we preferred it, all five relations could be part of the stored set; that is, we could legitimately store redundant representations of the data. However, the common belief blessed by current practice is that the normalized relations should be the only members of the stored set.

Even if the stored set contains only normalized relations, they need not map to different buckets of physical storage. Oracle is unique amongst RDBMS vendors in providing a convenient construct called the “table cluster” which is suitable for hierarchical schemas. In Codd’s example, employee# would be the cluster key, and rows corresponding to the same cluster key from all four tables could be stored in the same physical block on disk. If the cluster was a “hash cluster,” no indexes would be required to retrieve records corresponding to the same cluster key from all four tables.

Conclusion

The conclusion is that Amazon had an opportunity to kick the relational model up a notch but did not rise to the occasion. Amazon could have eaten its cake—extreme performance, extreme scalability, and extreme availability for important use cases such as shopping carts—and had it too—the relational model with all its wonderful declarative power. As Hugh Darwen said in the NoCOUG Journal interview: “Developers tend to be more concerned with convenience in database definition and updating than with the ease of deriving useful and reliable information from the database. … Those who disparage relational are almost invariably very far from being properly informed and almost invariably equate ‘relational’ with [current implementations].” I’ll leave you with that thought.

See also :  What’s so sacred about relational anyway?

1. “Using primary and foreign keys can impact performance. Avoid using them when possible.” (http://docs.oracle.com/cd/E17904_01/core.1111/e10108/adapters.htm#BABCCCIH)

2. “For performance reasons, the Oracle BPEL Process Manager, Oracle Mediator, human workflow, Oracle B2B, SOA Infrastructure, and Oracle BPM Suite schemas have no foreign key constraints to enforce integrity.” (http://docs.oracle.com/cd/E23943_01/admin.1111/e10226/soaadmin_partition.htm#CJHCJIJI)

3. “For database independence, applications typically do not store the primary key-foreign key relationships in the database itself; rather, the relationships are enforced in the application.” (http://docs.oracle.com/cd/E25178_01/fusionapps.1111/e14496/securing.htm#CHDDGFHH)

4. “The ETL process commonly verifies that certain constraints are true. For example, it can validate all of the foreign keys in the data coming into the fact table. This means that you can trust it to provide clean data, instead of implementing constraints in the data warehouse.” (http://docs.oracle.com/cd/E24693_01/server.11203/e16579/constra.htm#i1006300)

5. I’ve been unable to find the statement in the Release 1.0 archives at http://www.sbw.org/release1.0/ so I don’t really know the true source or author of the statement. However, the statement is popularly attributed to Esther Dyson and claimed to have been published in the Release 1.0 newsletter. I found a claim that the statement is found in the September 1988 issue but that didn’t pan out.

6. Chris Date is a strong proponent of “relation-valued attributes” (RVAs) and argues that relations with RVAs are as “normal” as those without. See “What First Normal Form Really Means” in Date on Database: Writings 2000-2006 (Apress, 2006).

Appendix: Table Clusters in Oracle Database

Here’s a demonstration of using Oracle table clusters to store records from four tables in the same block and retrieving all the components of the “employee cart” without using indexes. First we create four normalized tables and prove that all the records of a single employee including job history, salary history, and children are stored in a single database block so that there is never any join-penalty when assembling employee data. Then we create an object-relational view that assembles employee information into a single unnormalized structure and show how to insert into this view using an “INSTEAD OF” trigger.

The following demonstration was performed using a pre-Built developer VM for Oracle VM VirtualBox. The version of Oracle Database is 11.2.0.2.

SQL*Plus: Release 11.2.0.2.0 Production on Sun Jul 28 19:44:23 2013

Copyright (c) 1982, 2010, Oracle.  All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

First, we create a table cluster and add four tables to the cluster.

SQL> CREATE CLUSTER employees (employee# INTEGER) hashkeys 1000;

Cluster created.

SQL> CREATE TABLE employees
  2  (
  3    employee# INTEGER NOT NULL,
  4    name VARCHAR2(16),
  5    birth_date DATE,
  6    CONSTRAINT employees_pk PRIMARY KEY (employee#)
  7  )
  8  CLUSTER employees (employee#);

Table created.

SQL> CREATE TABLE job_history
  2  (
  3    employee# INTEGER NOT NULL,
  4    job_date DATE NOT NULL,
  5    title VARCHAR2(16),
  6    CONSTRAINT job_history_pk PRIMARY KEY (employee#, job_date),
  7    CONSTRAINT job_history_fk1 FOREIGN KEY (employee#) REFERENCES employees
  8  )
  9  CLUSTER employees (employee#);

Table created.

SQL> CREATE TABLE salary_history
  2  (
  3    employee# INTEGER NOT NULL,
  4    job_date DATE NOT NULL,
  5    salary_date DATE NOT NULL,
  6    salary NUMBER,
  7    CONSTRAINT salary_history_pk PRIMARY KEY (employee#, job_date, salary_date),
  8    CONSTRAINT salary_history_fk1 FOREIGN KEY (employee#) REFERENCES employees,
  9    CONSTRAINT salary_history_fk2 FOREIGN KEY (employee#, job_date) REFERENCES job_history
 10  )
 11  CLUSTER employees (employee#);

Table created.

SQL> CREATE TABLE children
  2  (
  3    employee# INTEGER NOT NULL,
  4    child_name VARCHAR2(16) NOT NULL,
  5    birth_date DATE,
  6    CONSTRAINT children_pk PRIMARY KEY (employee#, child_name),
  7    CONSTRAINT children_fk1 FOREIGN KEY (employee#) REFERENCES employees
  8  )
  9  CLUSTER employees (employee#);

Table created.

Then we insert data into all four tables. We find that all the records have been stored in the same database block even though they belong to different tables. Therefore the join-penalty has been eliminated.

SQL> INSERT INTO employees VALUES (1, 'IGNATIUS', '01-JAN-1970');

1 row created.

SQL> INSERT INTO children VALUES (1, 'INIGA', '01-JAN-2001');

1 row created.

SQL> INSERT INTO children VALUES (1, 'INIGO', '01-JAN-2002');

1 row created.

SQL> INSERT INTO job_history VALUES (1, '01-JAN-1991', 'PROGRAMMER');

1 row created.

SQL> INSERT INTO job_history VALUES (1, '01-JAN-1992', 'DATABASE ADMIN');

1 row created.

SQL> INSERT INTO salary_history VALUES (1, '01-JAN-1991', '1-FEB-1991', 1000);

1 row created.

SQL> INSERT INTO salary_history VALUES (1, '01-JAN-1991', '1-MAR-1991', 1000);

1 row created.

SQL> INSERT INTO salary_history VALUES (1, '01-JAN-1992', '1-FEB-1992', 2000);

1 row created.

SQL> INSERT INTO salary_history VALUES (1, '01-JAN-1992', '1-MAR-1992', 2000);

1 row created.

SQL> SELECT DISTINCT DBMS_ROWID.ROWID_BLOCK_NUMBER(rowid) AS block_number FROM employees where employee# = 1;

BLOCK_NUMBER
------------
       22881

SQL> SELECT DISTINCT DBMS_ROWID.ROWID_BLOCK_NUMBER(rowid) AS block_number FROM children where employee# = 1;

BLOCK_NUMBER
------------
       22881

SQL> SELECT DISTINCT DBMS_ROWID.ROWID_BLOCK_NUMBER(rowid) AS block_number FROM job_history where employee# = 1;

BLOCK_NUMBER
------------
       22881

SQL> SELECT DISTINCT DBMS_ROWID.ROWID_BLOCK_NUMBER(rowid) AS block_number FROM salary_history where employee# = 1;

BLOCK_NUMBER
------------
       22881

Next we create an object-relational view that presents each employee as an object.

SQL> CREATE OR REPLACE TYPE children_rec AS OBJECT (child_name VARCHAR2(16), birth_date DATE)
  2  /

Type created.

SQL> CREATE OR REPLACE TYPE children_tab AS TABLE OF children_rec
  2  /

Type created.

SQL> CREATE OR REPLACE TYPE salary_history_rec AS OBJECT (salary_date DATE, salary NUMBER)
  2  /

Type created.

SQL> CREATE OR REPLACE TYPE salary_history_tab AS TABLE OF salary_history_rec
  2  /

Type created.

SQL> CREATE OR REPLACE TYPE job_history_rec AS OBJECT (job_date DATE, title VARCHAR2(16), salary_history SALARY_HISTORY_TAB)
  2  /

Type created.

SQL> CREATE OR REPLACE TYPE job_history_tab AS TABLE of job_history_rec
  2  /

Type created.

SQL> create or replace view employees_view as
  2  SELECT
  3    employee#,
  4    name,
  5    birth_date,
  6    CAST
  7    (
  8      MULTISET
  9      (
 10        SELECT
 11          child_name,
 12          birth_date
 13        FROM children
 14        WHERE employee#=e.employee#
 15      )
 16      AS children_tab
 17    ) children,
 18    CAST
 19    (
 20      MULTISET
 21      (
 22        SELECT
 23          job_date,
 24          title,
 25          CAST
 26          (
 27            MULTISET
 28            (
 29              SELECT salary_date, salary
 30              FROM salary_history
 31              WHERE employee#=e.employee#
 32              AND job_date=jh.job_date
 33            )
 34            AS salary_history_tab
 35          ) salary_history
 36        FROM job_history jh
 37        WHERE employee#=e.employee#
 38      )
 39      AS job_history_tab
 40    ) job_history
 41  FROM employees e;

View created.

Let’s retrieve one employee object and look at the query execution plan. No indexes are used in retrieving records from each of the four tables. The cost of the plan is just 1. This is the minimum achievable cost, indicating that there is no join-penalty.

SQL> alter session set "_rowsource_execution_statistics"=true;

Session altered.

SQL> SELECT * FROM employees_view WHERE employee# = 1;

 EMPLOYEE# NAME             BIRTH_DAT
---------- ---------------- ---------
CHILDREN(CHILD_NAME, BIRTH_DATE)
------------------------------------------------------------------------------------------------------------------------------------
JOB_HISTORY(JOB_DATE, TITLE, SALARY_HISTORY(SALARY_DATE, SALARY))
------------------------------------------------------------------------------------------------------------------------------------
         1 IGNATIUS         01-JAN-70
CHILDREN_TAB(CHILDREN_REC('INIGA', '01-JAN-01'), CHILDREN_REC('INIGO', '01-JAN-02'))
JOB_HISTORY_TAB(JOB_HISTORY_REC('01-JAN-91', 'PROGRAMMER', SALARY_HISTORY_TAB(SALARY_HISTORY_REC('01-FEB-91', 1000), SALARY_HISTORY_
REC('01-MAR-91', 1000))), JOB_HISTORY_REC('01-JAN-92', 'DATABASE ADMIN', SALARY_HISTORY_TAB(SALARY_HISTORY_REC('01-FEB-92', 2000), S
ALARY_HISTORY_REC('01-MAR-92', 2000))))

SQL> SELECT * FROM TABLE(dbms_xplan.display_cursor(null, null, 'TYPICAL IOSTATS LAST'));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  aaxmaqz947aa0, child number 0
-------------------------------------
SELECT * FROM employees_view WHERE employee# = 1

Plan hash value: 2117652374

--------------------------------------------------------------------------------------------------------------
| Id  | Operation         | Name           | Starts | E-Rows |E-Bytes| Cost  | A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                |      1 |        |       |     1 |      1 |00:00:00.01 |       1 |
|*  1 |  TABLE ACCESS HASH| CHILDREN       |      1 |      1 |    32 |       |      2 |00:00:00.01 |       1 |
|*  2 |  TABLE ACCESS HASH| SALARY_HISTORY |      2 |      1 |    44 |       |      4 |00:00:00.01 |       3 |
|*  3 |  TABLE ACCESS HASH| JOB_HISTORY    |      1 |      1 |    32 |       |      2 |00:00:00.01 |       1 |
|*  4 |  TABLE ACCESS HASH| EMPLOYEES      |      1 |    845 | 27040 |       |      1 |00:00:00.01 |       1 |
--------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("EMPLOYEE#"=:B1)
   2 - access("EMPLOYEE#"=:B1)
       filter("JOB_DATE"=:B1)
   3 - access("EMPLOYEE#"=:B1)
   4 - access("EMPLOYEE#"=1)

Note
-----
   - cpu costing is off (consider enabling it)
   - dynamic sampling used for this statement (level=2)

30 rows selected.

Next, let’s create an “INSTEAD OF” trigger so that we insert into the view directly; that is, use a single insert statement instead of multiple insert statements. The trigger will do all the heavy-lifting for us.

SQL> CREATE OR REPLACE TRIGGER employees_view_insert
  2  INSTEAD OF INSERT ON employees_view
  3  REFERENCING NEW AS n
  4  FOR EACH ROW
  5  DECLARE
  6    i NUMBER;
  7  BEGIN
  8    INSERT INTO employees
  9    VALUES
 10    (
 11      :n.employee#,
 12      :n.name,
 13      :n.birth_date
 14    );
 15
 16    FOR i IN :n.children.FIRST .. :n.children.LAST
 17    LOOP
 18      INSERT INTO children
 19      VALUES
 20      (
 21        :n.employee#,
 22        :n.children(i).child_name,
 23        :n.children(i).birth_date
 24      );
 25    END LOOP;
 26
 27    FOR i IN :n.job_history.FIRST .. :n.job_history.LAST
 28    LOOP
 29      INSERT INTO job_history VALUES
 30      (
 31        :n.employee#,
 32        :n.job_history(i).job_date,
 33        :n.job_history(i).title
 34      );
 35      FOR j IN :n.job_history(i).salary_history.FIRST .. :n.job_history(i).salary_history.LAST
 36      LOOP
 37        INSERT INTO salary_history
 38        VALUES
 39        (
 40          :n.employee#,
 41          :n.job_history(i).job_date,
 42          :n.job_history(i).salary_history(j).salary_date,
 43          :n.job_history(i).salary_history(j).salary
 44         );
 45      END LOOP;
 46    END LOOP;
 47  END;
 48  /

Trigger created.

Finally, let’s insert an employee object directly into the view and confirm that we can read it back.

SQL> INSERT INTO employees_view
  2  VALUES
  3  (
  4    2,
  5    'YGNACIO',
  6    '01-JAN-70',
  7    CHILDREN_TAB
  8    (
  9      CHILDREN_REC('INIGA', '01-JAN-01'),
 10      CHILDREN_REC('INIGO', '01-JAN-02')
 11    ),
 12    JOB_HISTORY_TAB
 13    (
 14      JOB_HISTORY_REC
 15      (
 16        '01-JAN-91',
 17        'PROGRAMMER',
 18        SALARY_HISTORY_TAB
 19        (
 20          SALARY_HISTORY_REC('01-FEB-91', 1000),
 21          SALARY_HISTORY_REC('01-MAR-91', 1000)
 22        )
 23      ),
 24      JOB_HISTORY_REC
 25      (
 26        '01-JAN-92',
 27        'DATABASE ADMIN',
 28        SALARY_HISTORY_TAB
 29        (
 30          SALARY_HISTORY_REC('01-FEB-92', 2000),
 31          SALARY_HISTORY_REC('01-MAR-92', 2000)
 32        )
 33      )
 34    )
 35  );

1 row created.

SQL> SELECT * FROM employees_view WHERE employee# = 2;

 EMPLOYEE# NAME             BIRTH_DAT
---------- ---------------- ---------
CHILDREN(CHILD_NAME, BIRTH_DATE)
------------------------------------------------------------------------------------------------------------------------------------
JOB_HISTORY(JOB_DATE, TITLE, SALARY_HISTORY(SALARY_DATE, SALARY))
------------------------------------------------------------------------------------------------------------------------------------
         2 YGNACIO          01-JAN-70
CHILDREN_TAB(CHILDREN_REC('INIGA', '01-JAN-01'), CHILDREN_REC('INIGO', '01-JAN-02'))
JOB_HISTORY_TAB(JOB_HISTORY_REC('01-JAN-91', 'PROGRAMMER', SALARY_HISTORY_TAB(SALARY_HISTORY_REC('01-FEB-91', 1000), SALARY_HISTORY_
REC('01-MAR-91', 1000))), JOB_HISTORY_REC('01-JAN-92', 'DATABASE ADMIN', SALARY_HISTORY_TAB(SALARY_HISTORY_REC('01-FEB-92', 2000), S
ALARY_HISTORY_REC('01-MAR-92', 2000))))

Ten more reasons not to attend the NoCOUG summer conference‏ on August 15

You’ve already read the “Top 12 reasons why you should not attend a NoCOUG conference.” Here are ten more to keep you away from the summer conference on Thursday, August 15 at Chevron in San Ramon.

#10 August 15 is a sacred day to you, the dawning of the Age of Aquarius, a day to wind up the old gramophone again. (Yes, we know, Woodstock!)

#9 Sometimes you don’t want to go where everybody knows your name. (Sorry for being friendly, Oscar; we’ll scram now)

#8 You’re mad that whenever something reaches a state of perfection like Oracle Database 11gLarry immediately replaces it.

#7 You want to spite Larry and what better way than to avoid NoCOUG conferences? (We’re speechless)

#6 You’re amaxophobic and it’s too far to walk or bike (Try a pony next time)

#5 You’re still using Oracle 7 and loving it. (Ignorance is bliss, brother man)

#4 The more people you meet, the more you like your dog. (Anthropophobes of the world, unite!)

#3 You’ll scream if you hear “pluggable database” being plugged one more time. (Oops, we did it again)

#2 You agree with C. J. Date’s most excellent article in the latest NoCOUG Journal “No! to SQL and No! to NoCOUG.” (You actually read the NoCOUG Journal?)

But the #1 reason not to attend the NoCOUG conference on August 15 is:

#1 You blew your gas money on a big bet that Kate would have a baby girl!

Kindest regards,

The volunteers of your favorite Oracle user group

P.S. If you still want to attend the summer conference, please review the agenda and register at http://www.nocoug.org/rsvp.html.

Craig Shallahamer

One of the star cast of speakers at the summer conference. Click on the picture to find out who.

Categories: Announcements, DBA, NoCOUG, Oracle

Top 12 reasons why you should NOT attend the next NoCOUG conference

May 21, 2013 6 comments

#12 All NoCOUG emails automatically go to your spam folder, including this one. You rely on Outlook for career guidance.

#11 They won’t send a stretch limousine to pick you up and take you back.

#10 They talked up SQL for 25 years but now, they’re all, like, “No SQL.” I mean, really!

#9 You’re wayyyyy too busy working to learn anything new. (A very good problem to have!)

#8 Your head is exploding with knowledge already. (An even better problem to have!)

#7 It’s always the same people there, like Iggy and Kamran. (We totally understand but we can’t tell Iggy and Kamran to stop coming, can we?)

Iggy and Kamran

#6 You were there the day NoCOUG webmaster Eric Hutchinson sang the theme song from Cheers “Sometimes you want to go where everybody knows your name.” You don’t ever want to hear Eric sing again. Ever!

#5 The food is just too good. You eat to live, not live to eat. (Good for you!)

Fabulous Food

#4 You don’t appreciate being bribed with free raffle prizes like iPads, Oracle Press teddy bears, and Oracle Press books. (The world needs more upright and honest people!)

Lucky Winners

#3 You’ve been going for 25 years already; it’s time for a change. You’re going to AARP meetings now (American Association of Retired Persons).

#2 You’ve finally converted your company to Excel spreadsheets. So much cheaper and easier to use!

But the #1 reason not to attend the NoCOUG conference tomorrow is:

#1  You thought that NoCOUG was the North Carolina Oracle Users Group on the East coast!

A Brief History of Exadata Time by Juan Loaiza

The full article is available in the 106th issue of the NoCOUG Journal.

“we leveraged our 20–30 years of database experience to determine what would be the ideal platform for running the Oracle database. That’s the thinking that produced the Exadata platform as we know it today.”

“Exadata V1 used HP hardware. Exadata V2 used Sun hardware. Oracle has always worked very closely with Sun, but of course, with Sun becoming a part of Oracle, our relationship became much closer. And, we had a very clear directive from Larry Ellison that engineered systems were critical to Oracle’s overall strategy, and that was understood by both the database and hardware teams. We quickly got all the cooperation, all the features, all the fixes, and all the improvements that we wanted from Sun. When you are a single company and the direction is set very clearly by the leadership, then the hardware and the software integration can advance much faster. If you are two different companies, there are always different priorities in the different companies and this slows down progress.”

When you are a single company and the direction is set very clearly by the leadership, then the hardware and the software integration can advance much faster. If you are two different companies, there are always different priorities in the different companies and this slows down progress.

“When Exadata V1 first launched, it was a purely disk-based system. We added Flash in V2, but still the focus was primarily on disks. We had a disk focus, with Flash for acceleration. The big change with Exadata X3 is that we’ve increased the Flash memory capacity very significantly: we’ve quadrupled the amount of Flash memory.”

“Exadata X3 has been designed to work really well for all types of applications, including OLTP, warehousing, mixed workloads, and cloud”

“what is available today with Exadata is really just the beginning. We’re no longer focused on making the basic platform work , we’re now primarily focused on value add—things like improving consolidation, improving performance, improving compression—increasing the value of all the Exadata technologies even farther.”

The full article is available in the 106th issue of the NoCOUG Journal.

Categories: DBA, NoCOUG, Oracle

Be Very Afraid: An interview with the CTO of database security at McAfee

February 13, 2013 Leave a comment

As published in the 105th issue of the NoCOUG Journal (February 2013)

Slavik Markovich is vice president and chief technology officer for Database Security at McAfee and has over 20 years of experience in infrastructure, security, and software development. Slavik co-founded Sentrigo, a developer of leading database security technology that was acquired by McAfee in April 2011. Prior to co-founding Sentrigo, Slavik served as VP R&D and chief architect at db@net, a leading IT architecture consultancy. Slavik has contributed to open-source projects, is a regular speaker at industry conferences, and is the creator of several open-source projects like FuzzOr (an Oracle fuzzer) and YAOPC (Yet Another Oracle Password Cracker). Slavik also regularly blogs about database security at www.slaviks-blog.com.

Down in the street little eddies of wind were whirling dust and torn paper into spirals, and though the sun was shining and the sky a harsh blue, there seemed to be no color in anything except the posters that were plastered everywhere. The black-mustachio’d face gazed down from every commanding corner. There was one on the house front immediately opposite. BIG BROTHER IS WATCHING YOU, the caption said, while the dark eyes looked deep into Winston’s own. … Behind Winston’s back the voice from the telescreen was still babbling away about pig iron and the overfulfillment of the Ninth Three-Year Plan. The telescreen received and transmitted simultaneously. Any sound that Winston made, above the level of a very low whisper, would be picked up by it; moreover, so long as he remained within the field of vision which the metal plaque commanded, he could be seen as well as heard. There was of course no way of knowing whether you were being watched at any given moment. How often, or on what system, the Thought Police plugged in on any individual wire was guesswork. It was even conceivable that they watched everybody all the time. But at any rate they could plug in your wire whenever they wanted to. You had to live—did live, from habit that became instinct—in the assumption that every sound you made was overheard, and except in darkness, every movement scrutinized.1984 by George Orwell

Is my financial and medical information safe from the bad guys? After watching Die Hard 4, I’m not so sure, because it seems that bad guys can access, change, or erase anybody’s information with a few keystrokes.

Although life is not a movie, and the situation is not quite as bad as Die Hard 4, it is not that good either. You can read about breaches with varying degrees of severity every week. While the “bad guys” require a bit more than a few keystrokes to access/change information, they have very sophisticated tools at their service. World-spanning global botnets, automated hacking tools, a flourishing underground market, and a strong financial incentive all motivate the “bad guys” to continue breaking into systems.

On the flipside, there have been many significant changes and improvements to the applicable regulations associated with protection of PHI and ePHI healthcare information. In addition, the enhanced enforcement of HIPAA, and the newer HITECH, regulations has increased the visibility of—and, arguably, attention to—affected organizations complying with these regulatory mandates. SOX, GLBA, and other financial regulations are intended to address the integrity and authenticity of financial records. So, the organizations keeping your records are forced to think about security.

I would also add that it isn’t always “the bad guys” that cause data compromise—sometimes it’s caused accidentally, either by human, or system(s), error. To summarize, if you are being targeted, I’d say that there is a pretty good chance that the hackers will succeed in compromising your details. On the other hand, your liability is limited, at least on the financial front.

Why is information security so poor in general? Is it because administrators and users—me included—are clueless about information security, or is it because the operating systems, databases, networks, languages, and protocols are inherently vulnerable, which makes our task much harder than it really ought to be?

Indeed, there is a big awareness issue when it comes to security. Users, developers, and administrators generally lack deep understanding of security and, as everybody knows, security is only as strong as your weakest link. The “bad guy” just needs one successful try on a single attack vector, while the security protections need to cover all bases, all the time. It’s an asymmetric game where currently the “bad guys” have the advantage.

When specifically talking about “database security,” the reality is that the overall risk posture for these systems, and the often highly sensitive and/or business-critical information they contain, is most often grossly underestimated by the respective organizations. A comparison can be made to what the famous 1930s bank robber Willie Sutton was often quoted as saying, when asked by a reporter why he robbed banks: “Because that’s where the money is.” The “bad guys” often target these databases, and the valuable data assets they contain, because they know that’s where they can get the biggest bang for their buck (i.e., the highest return for their exploit efforts).

Also, the associated risk to them of being caught and subsequently penalized is very often quite low combined with the associated payoff (return) being quite high. So from an ROI perspective, their motivating rationale is abundantly clear.

Finally, if you were indeed “clueless” about security, you probably wouldn’t be asking these types of targeted questions.

The analogy is that certain cars are the favorites of car thieves because they are so easy to break into. Why are salted password hashes not the default? Why are buffer overflows permitted? Why was it so easy for China to divert all Internet traffic through its servers for 20 minutes in April 2010? Why is Windows so prone to viruses? Is it a conspiracy?

My motto is “always choose stupidity over conspiracy.” It goes back to the issue of lack of awareness. Developers that are not constantly trained on security will introduce security issues like buffer overflows or passwords stored in clear text or encrypted instead of hashed with a salt, etc. Some protocols were not designed with security in mind, which makes them susceptible to manipulation. Some targets are definitely softer than others.

At an absolute minimum, measures should be taken to harden the respective systems, as per the individual vendors’ guidelines and instructions. Unnecessary system services and processes should be disabled to reduce the attack surface, appropriate access control mechanisms should be properly configured, critical system patching should be done on a regular basis, etc.

But, unfortunately, these minimal security measures are often insufficient to address the rapidly expanding threat landscape. System visibility, in as near real time as possible, is required. Automated user process monitoring, vulnerability assessment, event correlation, and accompanying security policy notifications/alerting for these systems needs to be provided.

Is the cloud safe? Is SaaS safe?

I do not believe that the cloud or the SaaS model is inherently more or less safe—it is just a different kind of safe. Depending on the organizations’ risk appetite, they can be provided with the appropriate safeguards and controls to make implementation of private and public cloud-based services correspondingly “safe.” Technological controls, as well as organizational and administrative controls, need to be tailored for these types of deployments.

It’s also critical that the database security model be extensible and scalable to accommodate virtual and cloud-based environments.

Do we need better laws or should we trust the “enlightened self-interest” of industry? Enlightened self-interest—the mantra of Fed chairman Alan Greenspan—didn’t prevent the financial collapse Will it prevent the digital equivalent of Pearl Harbor?

“Enlightened self-interest,” by itself, is usually insufficient. At least it has been proven to be up to now. On the other hand, over-regulation would not be a good alternative, either. There has to be a happy medium—where government and private industry work together to promote a more secure environment for commercial transactions to occur, and where consumers’ privacy is also protected. But, unfortunately, we’re not there yet.

If not laws, how about some standards? Why aren’t there templates for hardened operating systems, databases, and networks? Or are there?

There are numerous standards for applying security controls to these systems, including Center for Internet Security (CIS), which includes “hardening” benchmarks for a variety of different systems and devices, as well as the NIST 800 Series Special Publications that offer a very large set of documents addressing applicable policies, procedures, and guidelines for information security. In addition, most of the more significant IT product vendors provide specific hardening guidelines and instructions pertaining to their various products.

The problem is how to consistently measure and make sure that your systems do not deviate from the gold standard you set. Unfortunately, systems tend to deteriorate with use—parameters are changed, new credentials and permissions are introduced, etc. An organization without a consistent, proven way to scan systems is going to have issues no matter how close it follows the standards. A recent scan we did with a large enterprise discovered over 15,000 weak passwords in their databases. In theory, they followed very strict federal policies.

Who will guard the guards themselves? As an administrator, I have unlimited access to sensitive information. How can my employer protect itself from me?

There’s a fundamental tenet in information security called “principle of least privilege,” which basically says that a user should be given the necessary authorization to access the information they need to perform their tasks/job—but no more than that level of privileged access. In addition, there’s another concept called “separation (or “segregation”) of duties,” which states that there should be more than one person required to complete a particular task, in order to help prevent potential error or fraud.

In the context of databases, this translates to not allowing users and administrators to have more access than is required for them to do their jobs—and for DBAs, that the DB administrative tasks will be monitored in real time and supervised by a different team, usually the information security team. A security framework that enforces these database access control policies is critical, because the inconvenient fact is, many compromises of DBs involve privileged access by trusted insiders.

While there is a much higher probability that someone who is not a DBA would try to breach the database, the DBA is in a much better position to succeed should he or she really want to do that.

If risk is the arithmetical product of the probability of an incident happening and the potential damage that incident could cause, then due to the latter factor, DBAs as well as other highly skilled insiders with access privileges pose a significant risk.

In 2007, Computerworld and other sources reported that a senior DBA at a subsidiary of Fidelity National Information Services Inc. sold 8.5 million records, including bank account and credit card details, to a data broker. An external hacker would find it very difficult to achieve this kind of scale without insider cooperation.

It is important, for security as much as for regulatory compliance reasons, to monitor and audit DBA activity. In fact, this should be done for all users who access the database. DBAs are the first to understand this. If you work in a bank vault, you know there are CCTV cameras on you. You want those cameras on you. DBAs are in a similar situation, and they understand this requirement completely.

What DBAs should not accept are solutions that hinder or interfere with the DBA’s daily tasks—DBAs are primarily concerned with running databases efficiently. Any solution that jeopardizes this primary objective is counter-productive and doomed to fail anyway, because DBAs and other staff will find ways to circumvent it.

What DBAs should not accept are solutions that hinder or interfere with the DBA’s daily tasks—DBAs are primarily concerned with running databases efficiently. Any solution that jeopardizes this primary objective is counter-productive and doomed to fail anyway, because DBAs and other staff will find ways to circumvent it.

At the risk of getting lynched by Journal readers, I have to ask your opinion about certification. Information Technology is the only profession whose practitioners are not subject to licensing and certification requirements. Can we really call ourselves “professionals” if we are not subject to any rules? Doesn’t the cost-benefit analysis favor licensing and certification? Even plumbers and manicurists in the state of California are subject to licensing and certification requirements but not IT professionals. Do you advocate security certification?

Well—while there’s certainly value in conducting user security awareness training and in promoting and achieving professional security certification, there are some issues. Like who would the accrediting body be? Who exactly needs to be certified? Will there be different levels of certification? Will each OS, DB, network device, application, etc., require its own distinct cert? It can quickly get very complicated.

But a shorter answer could be yes—I advocate security certifications.

In the novel 1984, George Orwell imagined that a device called a “telescreen” would allow “Big Brother” to listen to everything you said. The reality in 2013 is much worse since so much is digital, including my every message, phone call, and commercial transaction, and the cell phone is everybody’s personal electronic monitoring bracelet. What steps should we take to protect ourselves in this brave new digital world?

One possible answer might depend on how much security an individual is willing to trade for a potential reduction of features and functionality. For example, when “location services” are enabled on your phone, a variety of enhanced proximity-based services are then available, like several kinds of mapping services, driving directions and conditions, identification of nearby retail outlets, restaurants, gas stations, etc.

In addition, you can also locate your phone if it gets lost, wipe it of its contents, and/or have emergency services find you to provide help. But you also potentially get location-based advertisements, and there’s the specter of the device and application vendors (browser and service providers, too) aggregating and mining your various voice/data transmission location(s), for their own commercial purposes. The ongoing “privacy vs. commerce” battles involved in the “Do Not Track” discussions are good examples of these often-conflicting forces.

My personal assumption is that anything I publish on any network (text message, Facebook, Twitter, etc.) is public, no matter what settings it is published with. If I want to keep something private, I encrypt it. But, I’m willing to make privacy sacrifices in the name of convenience. I do use GPS; I do use Facebook and LinkedIn, etc.

Thank you for spending so much time with us today. Would you like to tell Journal readers a little about today’s McAfee? What are your current products? What is in the pipeline?

Well, I’m glad you asked. The McAfee Database Security solution comprises a core set of three products that serve to scan, monitor, and secure databases:

  • McAfee Vulnerability Manager for Databases, which automatically discovers databases on the network, detects sensitive information in them, determines if the latest patches have been applied, and performs more than 4,700 vulnerability checks.
  • McAfee Database Activity Monitoring, which provides automatic, non-intrusive, and real-time protection for heterogeneous database environments on your network with a set of preconfigured security defenses, and also provides the ability to easily create custom security policies based on configurable, and very granular, controls. In addition, it has the capability to deliver virtual patching updates on a regular basis to protect from known vulnerabilities.
  • McAfee Virtual Patching for Databases (vPatch), which protects unpatched databases from known vulnerabilities and all database servers from zero-day attacks based on common threat vectors, without having to take the database offline to patch it. Additionally, vPatch has been accepted as a “compensating control” in compliance audits.

The McAfee Database Security solution is also tightly integrated with McAfee’s centralized security management platform, ePolicy Orchestrator (ePO), which consolidates enterprise-wide security visibility and control across a wide variety of heterogeneous systems, networks, data, and compliance solutions.

At McAfee, we do not believe in a silver bullet product approach. No security measure can protect against all attacks or threats. However, McAfee’s Security Connected framework enables integration of multiple products, services, and partnerships for centralized, efficient, and effective security and risk management. ▲

Categories: Interviews, NoCOUG

The golden rule of NOT tuning SQL

February 7, 2013 Leave a comment

Dear NoCOUG members and friends,

The golden rule of not tuning SQL is “operate with as little information as possible.” Not only will this increase your chances of failure but it will make it difficult for others to help you.

Every exercise in problem solving – no matter how little – has exactly six stages: Problem, Observation, Implication, Solution, Execution, and Documentation. I use the acronym POISED to remember the six stages. If you strive to operate with as little information as possible, you increase the chances that you will not progress beyond the Observation stage.

I recently saw a request for SQL tuning help (http://www.freelists.org/post/oracle-l/Query-tuning-help). The poster had provided the EXPLAIN PLAN output. You probably know that EXPLAIN PLAN output is not of much use in SQL tuning. He was asked to add the GATHER_PLAN_STATISTICS hint and provide the output of DBMS_XPLAN.DISPLAY_CURSOR (http://www.freelists.org/post/oracle-l/Query-tuning-help,7). The rowsource execution statistics in the DISPLAY_CURSOR output showed that hundreds of thousands of rows were being retrieved and then discarded. There was enough information to deduce the right joining order (http://www.freelists.org/post/oracle-l/Query-tuning-help,9) but not enough to explain why the query optimizer was not finding the right order.

The best tool for tuning SQL is SQLTXPLAIN because it collects all the information that is required for tuning SQL. The author of this tool is Carlos Sierra. He will be conducting a two-day SQL tuning master class at the CarrAmerica conference center in Pleasanton on February 19 and 20. He will also speak at our winter conference at the Oracle conference center in Redwood Shores on February 21. I hope that you can attend at least one event.

Kindest regards,

Iggy Fernandez
NoCOUG Journal editor

Categories: Announcements, DBA, NoCOUG, Oracle, SQL

Attention Looney Tuners and SQL Sinners everywhere

December 6, 2012 Leave a comment

Do you write or tune SQL statements in your job? Are you a Looney Tuner? A SQL Sinner? Do you yearn to learn the secrets of SQL performance? Are you looking for a guru to dispel the clouds of confusion? Who better than the guy who created the tools used by Oracle Support to tune troublesome SQL? You’re right, we’re talking about the one, the only Carlos Sierra, the author of SQLTXPLAIN, TRCA (aka “TKPROF On Steroids”), and SQLHC (SQL Health Check).

On February 19, Carlos Sierra will deplane at NoCOUG to conduct an intensive two-day SQL tuning class featuring all his awesome tools. Did I mention that these are the same tools that are used by Oracle Support to tune troublesome SQL? If you think that you might possibly be interested in attending this master class, please send a message to training@nocoug.org so that we can inform you once all the details are finalized and registration is opened.

Categories: Announcements, NoCOUG, Oracle, SQL
%d bloggers like this: