Home > DBA, Oracle, Physical Database Design > Lesson 1 of 40: Physical Database Design for Oracle Databases: Necessity and Definition

Lesson 1 of 40: Physical Database Design for Oracle Databases: Necessity and Definition


Now that my publisher and I have amicably parted ways, I am free to self-publish my work on physical database design. Some explanation is called for. I did not want to rewrite the excellent sections on physical database design that can be found in the free online guides provided by Oracle Corporation; instead I wanted to direct the reader to the online guides for supplemental information. This would have meant that my book would be far shorter and take far longer to write than the contract called for. Now that the publishing rights have reverted back to me, I plan on serializing my work here in the form of short lessons with links to the relevant sections of the online Oracle guides for supplemental information. Comments and criticism will always be welcome; it will help me improve the work which I hope to eventually publish in book form under the title Physical Database Design for Oracle Databases: Indexes, Clusters, Partitions, and Materialized Views.

In the first lesson you will learn the necessity of Physical Database Design and find out what is involved.

The year was … well just let’s say it was a long time ago. My friends Andrea, Mabel, and Sameer and I had been assigned to work on a big software development project. We were unfamiliar with the technologies that would be used—VMS, ACMS, TDMS, and Rdb—but training was promised. The very first thing the instructor said was “First you have to insert your definitions into the CDD,” and he walked to the chalkboard and wrote the commands that we needed for the purpose. Needless to say, we were quite flustered because we had no idea what those “definitions” might be or what a “CDD” was and how it fit into the big picture. We were being taught practice before the theory.

Those who are in love with practice without knowledge are like the sailor who gets into a ship without rudder or compass and who never can be certain whether he is going. Practice must always be founded on sound theory.”The Discourse On Painting by Leonardo da Vinci

You’ll appreciate that I don’t begin by saying “First you have to create your indexes” and giving you the commands that you need for the purpose. Instead I’ll first put physical database design into perspective, explain its purpose, and provide a definition.

Definitions of Logical and Physical Database Design

Database design can be divided into a logical phase (logical database design) and a physical phase (physical database design). Logical database design is the process of designing normalized database tables. (Lightstone et al. 2007) Consider the following example taken from a paper by Edgar Codd, the inventor of relational database theory:

Suppose the data bank contains information about parts and projects. For each part, the part number, part name, part description, quantity-on-hand, and quantity-on-order are recorded. For each project, the project number, project name, project description are recorded. Whenever a project makes use of a certain part, the quantity of that part committed to the given project is also recorded.

You could conceivably construct the following solution consisting of just one table:

CREATE TABLE part_project_commit
  (
    -- part information
    part#               NUMBER(6) NOT NULL,
    part_name           VARCHAR2(16) NOT NULL,
    part_description    VARCHAR2(64) NOT NULL,
    quantity_on_hand    NUMBER(18,6) NOT NULL,
    quantity_on_order   NUMBER(18,6) NOT NULL,
    -- project information
    project#            NUMBER(6) NOT NULL,
    project_name        VARCHAR2(16) NOT NULL,
    project_description VARCHAR2(64) NOT NULL,
    -- quantity of part committed to project
    quantity_committed  NUMBER(18,6) NOT NULL,
    CONSTRAINT part_project_commit_pk PRIMARY KEY (part#, project#)
  );

The above solution is obviously problematic; for example, multiple rows may have to be updated whenever the quantity_on_hand of a product changes. Through a process of data modeling and database normalization a better solution involving three tables can be constructed as follows:

CREATE TABLE part
  (
    part#             NUMBER(6) NOT NULL,
    part_name         VARCHAR2(16) NOT NULL,
    part_description  VARCHAR2(64) NOT NULL,
    quantity_on_hand  NUMBER(18,6) NOT NULL,
    quantity_on_order NUMBER(18,6) NOT NULL,
    CONSTRAINT part_pk PRIMARY KEY (part#)
  );

CREATE TABLE project
  (
    project#            NUMBER(6) NOT NULL,
    project_name        VARCHAR2(16) NOT NULL,
    project_description VARCHAR2(64) NOT NULL,
    CONSTRAINT project_pk PRIMARY KEY (project#)
  );

CREATE TABLE commit
  (
    part#              NUMBER(6) NOT NULL,
    project#           NUMBER(6) NOT NULL,
    quantity_committed NUMBER(18,6) NOT NULL,
    CONSTRAINT commit_pk PRIMARY KEY (part#, project#),
    CONSTRAINT commit_part_fk FOREIGN KEY (part#) REFERENCES part,
    CONSTRAINT commit_proj_fk FOREIGN KEY (project#) REFERENCES project
  );

The above CREATE TABLE commands can be used to construct tables in your Oracle database. This concludes the process of logical database design as we defined it. However, the work of database design is not complete; we still have to consider the question of performance. The default organization of an Oracle database table is an unordered heap; in other words, an Oracle database table starts out as an unorganized collection of records. The SQL queries that are retrieve data from these tables will need the help of indexes and other mechanisms to retrieve data efficiently. They are transformed by the query optimizer into sequences of database operations such as restriction, projection, inner join, outer join, semi-join, anti-join, aggregation, and sorting. These operations are illustrated in the following examples; some experience with SQL is assumed.

Our first example illustrates the restriction and projection operations. The restriction operation eliminates rows from the target of the query while the projection operation eliminates columns.

-- Illustration of restriction and projection
-- List part names for certain parts

SELECT prt.part#,
  prt.part_name
FROM part prt
WHERE part# BETWEEN 10000 AND 20000
ORDER BY prt.part#;

Our second example illustrates the inner join operation. It is used to associate rows from two tables.

-- Illustration of inner join
-- List each project and the parts it uses
-- Projects that use no parts are not listed

SELECT prj.project#,
  prj.project_name,
  cmt.part#,
  cmt.quantity_committed
FROM project prj
INNER JOIN commit cmt
ON (cmt.project# = prj.project#)
ORDER BY prj.project#,
  cmt.part#;

Our third example illustrates the left outer join operation. Like the inner join operation, it is also used to associate rows from two tables. However, rows from the left table that have no corresponding rows in the right table are also included in the result. The right outer join operation is similar to the left outer join operation; rows from the right table that have no corresponding rows in the left table are included in the result. The full outer join operation includes rows from both the left and right tables even if they have no corresponding rows in the right and left tables respectively.

-- Illustration of left outer join
-- List each part and the projects in which it is used
-- Parts that are not used in any project are also listed

SELECT prt.part#,
  prt.part_name,
  cmt.project#,
  cmt.quantity_committed
FROM part prt
LEFT OUTER JOIN commit cmt
ON (cmt.part# = prt.part#)
WHERE prt.part# BETWEEN 10000 AND 20000
ORDER BY prt.part#,
  cmt.project#;

Our fourth example illustrates a check for existence in a correlated subquery. The corresponding database operation is called a semi-join.

-- Illustration of semi-join
-- List parts that are used in at least one project

SELECT prt.part#,
  prt.part_name,
  prt.part_description,
  prt.quantity_on_hand,
  prt.quantity_on_order
FROM part prt
WHERE EXISTS
  (
  -- Correlated subquery
  SELECT * FROM COMMIT cmt WHERE cmt.part#=prt.part#
  )
ORDER BY prt.part#;

Our fifth example illustrates a check for non-existence in a correlated subquery. The corresponding database operation is called an anti-join.

-- Illustration of anti-join
-- List parts that are not used in any projects

SELECT prt.part#,
  prt.part_name,
  prt.part_description,
  prt.quantity_on_hand,
  prt.quantity_on_order
FROM part prt
WHERE NOT EXISTS
  (
  -- Correlated subquery
  SELECT * FROM COMMIT cmt WHERE cmt.part#=prt.part#
  )
ORDER BY prt.part#;

Our sixth example illustrates aggregation.

-- Illustration of aggregation
-- List the total committed quantity for each part

SELECT cmt.part#,
  SUM(cmt.quantity_committed) AS tot_quantity_committed
FROM commit cmt
GROUP BY cmt.part#
ORDER BY cmt.part#;

Obviously, SQL queries can be much more complicated than the above examples; queries that are thousands of lines long are not unheard of. Queries can include any number of tables and can build upon intermediate queries as in the following example which lists projects which use more than 50% of the committed usage of any part.

-- A complex query
-- List projects which use more than 50% of the committed usage of any part

SELECT cmt.project#,
  cmt.part#,
  cmt.quantity_committed,
  cmt.quantity_committed/tot.tot_quantity_committed * 100 AS pct_usage
FROM COMMIT cmt
INNER JOIN
  (
  -- Intermediate query
  -- List the total committed quantity for each part
  SELECT cmt.part#,
    SUM(cmt.quantity_committed) AS tot_quantity_committed
  FROM COMMIT cmt
  GROUP BY cmt.part#
  ) tot
ON (cmt.part# = tot.part#)
WHERE cmt.quantity_committed/tot.tot_quantity_committed * 100 > 50
ORDER BY cmt.project#,
  cmt.part#;

Three broad categories of performance mechanisms are available to improve the performance of database operations. Indexes help in restriction, projection, joins of all types, and sorting. Partitions and clusters can be used to co-locate data with similar characteristics in order to limit the amount of physical I/O. Finally, materialized views can be used to perform joins and aggregations ahead of time.

Physical database design is the process of designing indexes, partitions, clusters, and materialized views. (Lightstone et al. 2007) Its purpose is to improve the efficiency of database operations such as restriction, projection, joining, aggregation, and sorting.

This concludes the first lesson.

Summary

  • Database design can be divided into a logical phase (logical database design) and a physical phase (physical database design).
  • Logical database design is the process of designing normalized database tables. (Lightstone et al. 2007)
  • Physical database design is the process of designing indexes, partitions, clusters, and materialized views. (Lightstone et al. 2007)
  • The purpose of physical database design is to improve the efficiency of database operations such as restriction, projection, joining, aggregation, and sorting.

Further Reading

Exercises

  • Perform the database design exercise in SQL Developer Data Modeler User’s Guide Release 3.0.
  • Discuss whether physical database design is required for small databases that completely fit into memory.
  • Discuss whether database normalization affects query performance.
  • Create the Part, Project, and Commit tables using the CREATE TABLE statements provided in the lesson. Identify the indexes that were automatically created by Oracle even though we did not explicitly ask for them to be created. Justify why Oracle decides to create these indexes automatically.
  • Use the UNION clause instead of the OUTER JOIN clause to list each part and the projects in which it is used, including the parts which are not used in any projects. Use the INNER JOIN clause instead of a correlated subquery to list parts that are used in at least one project. Use the MINUS clause instead of a correlated subquery to list parts that are not used in any project. Use analytic functions instead of the GROUP BY clause to lists projects which account for more than 50% of the committed usage of any part. Discuss the performance impacts of rewriting queries and whether the physical database design is influenced by how queries are written.

Next Lesson

  1. September 13, 2011 at 5:13 am

    Excellent treatise on one of the most important but often the most rushed through steps in a system design.

  2. Iggy Fernandez
    September 13, 2011 at 8:28 am

    Thanks for the kind words, Arup.

  1. September 7, 2011 at 10:59 pm

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: