Posts Tagged ‘Structured Query Language’

Questioning Method R: An Interview with Cary Millsap

September 10, 2009 2 comments

Cary Millsap is the co-creator of Method R, an antidote for Compulsive Tuning Disorder—the disease that afflicts so many of us. His motto is “Nullius in verba” which means “On the words of no one.” In other words, check the facts for yourself.

What’s Method R and how does it cure Compulsive Tuning Disorder?

Method R is a commonsense approach to optimizing a system. The “R” stands for “response time.” Many people have used Method R extensively at Oracle sites throughout the 2000s with tremendous success. Jeff Holt and I first documented the method in the book Optimizing Oracle Performance (O’Reilly 2003). You can read Chapter 1 online for free here.

Method R prescribes four simple steps:

  1. Identify the task that’s the most important to you.
  2. Measure its response time (R). In detail.
  3. Optimize that response time in the most economically efficient way.
  4. Repeat until your system is economically optimal.

Compulsive Tuning Disorder, or CTD, is a term introduced by Gaja Vaidyanatha, Kirti Deshpande, and John Kostelac in their 2001 book, Oracle Performance Tuning 101. CTD is the compulsion to tinker forever with the performance of a system, inspired by the hope that there might be something in there that needs improving. It’s a disorder instead of a virtue because CTD causes people to waste time and money.

Jeff and I developed Method R for Oracle in response to several big problems with traditional “tuning” approaches, the CTD-inspiring absence of a terminating condition among them.

CTD just doesn’t make sense in a Method R context. With Method R, you know exactly whether or not a computer-executed business task has any time in it that’s been wasted. If there is, you either fix it or prove that fixing it isn’t worth the cost. If there isn’t any wasted time in there, you move on. That’s the key: either way you move on. You don’t linger in a non-productive state, because you know exactly where there’s opportunity for improvement and you know exactly where there’s not.

We love stories. Tell us a performance tuning story. Tell us two.

My favorite performance tuning story at the moment is a note I received yesterday from a customer who has recently bought our Method R Profiler product. He and his team had previously been using an open-source profiling tool, so I had expressed interest in how he was feeling so far about the purchase. Here is the response I received:

“Within one minute of installation the Profiler exceeded all expectations. We went live with a new warehouse management system this Friday because all of the tuning was done ahead of time. The Profiler pointed out all of the application inefficiencies which we turned over to the vendor. They were able to fix the ones with the largest impact on the online response times. That gave our management the confidence they needed to make the Go/No-Go decision. Until I got the Profiler we were looking at postponing it until next quarter. I pity the poor souls that have to optimize in mere mortal ways.”

I love this story because it touches on so many of the reasons we created Method R in the first place, like:

  • It hooks right in to the business in a way that makes performance easy for everyone in the company to understand.
  • It gives you positive evidence of success—not just some uncertain feeling that maybe you’re okay because you haven’t yet found any problems. This is why the CTD problem never enters into a Method R project.
  • It shows you unambiguously what’s really taking your time, so you don’t have to guess what effect your tuning will have on the business task you’re trying to help. Nullius in verba!
  • It gives you more power to help your application software vendors help you. It makes you faster and more effective, even when you buy your applications from somebody else.

Which is better, Method R or ASH/AWR?

That’s like asking, “Which is better, Google Maps or fuel injection?” Method R is a process to accomplish a specific goal. ASH/AWR is a functional capability within the Oracle Database software that allows someone to partly implement Method R.

A major confusion I’ve encountered in the field is the misunderstanding in which people equate Method R with Oracle trace data. The method called Method R has nothing specifically to do with Oracle or databases—or even computer software. It’s simply a method for optimizing a process—any process, like driving to work or shopping for groceries.

It does so happen that implementing Method R within the Oracle Database context requires Oracle extended SQL trace data in some (I’d say many) cases. There are two principal reasons for this. First, ASH/AWR is wired to the concept of database time. That’s different from end-user response time. In some cases, it’s very different. In those cases, we need to see the amount of “SQL*Net message from client” time that a process consumes, and even the time that’s completely unaccounted for by database measurement probes.

Second, there are issues of data skew that you can detect in detailed trace data that are impossible to detect with ASH/AWR. For example, 100 calls consumed 100 seconds. How much time will we save if we can eliminate 90 calls? The answer is not necessarily 90 seconds; it could be anything between nearly 0 seconds and nearly 100 seconds. It depends completely upon which specific calls you eliminate. (Hint: Imagine that the 100 call durations were as follows: 99.01s, .01s, .01s, .01s, …, .01s.)

ASH and AWR are good things. But an argument for ASH/AWR is not an argument against Method R. The closer Oracle Corporation comes to giving us the true end-user response-time measurements that Method R requires, the better for everyone.

We spend a lot of time tuning databases. Why does it have to be that way? Is Oracle difficult to tune? Are we building the databases incorrectly?

I began learning how to tune Oracle databases in 1989. At the time, it seemed nearly impossible to me that people could be productive by following these instructions that the “experts” at the time were telling me I had to follow. I would have felt better had I known that it wasn’t just my own ineptitude working against me. In recent years, I’ve been able to prove why the instructions we were taught were fundamentally ineffective against a lot of the everyday performance problems we all have to face.

So is Oracle difficult to tune? If you do it the way we’ve all been taught, then absolutely: it’s nearly impossible. In fact, the better you are as a student, the harder it gets. It’s because the method is flawed. Is it difficult to peel an apple? If the only tool you have is a steel ball bearing, then absolutely: it’s nearly impossible.

I think a lot of databases are built incorrectly because people don’t measure them properly as they’re building them. It’s a result of the same method-is-flawed problem I just mentioned. The things that database architects, application developers, operational administrators—everyone in the stack—have been taught to measure regarding performance are just ludicrous. The measurements you’re taught to use—the counters and the ratios—are not connected to the qualities that real people using the real application in real life will really measure later.

It’s difficult to do anything new—especially when it’s complicated—without immediate feedback. Imagine trying to learn how to putt in golf in an environment where you couldn’t learn where your ball went until three months after you hit it.

That’s what people do with their databases. Everyone in the project has lots and lots of tasks to do, but often it’s nowhere in the plan to measure end-user response time during the construction process. Then, when the system goes live three months later, the only performance metric that most people are going to care about hasn’t even been looked at. Worse yet, many systems (most systems) don’t include instrumentation that allows an analyst in a real-life production environment to even measure what the real response time is as it’s happening.

There’s no way you can expect to succeed in an environment like that.

Won’t upgrading the hardware save time and money in the long run?

It depends. Before you upgrade hardware, you really need to consider Amdahl’s Law:

Your upgrade will improve performance of only those tasks that use a lot of the component you improved.

A lot of people upgrade CPUs to 2× faster devices, or maybe, as I recently wrote on my blog, they upgrade their disk drives to 100× faster flash drives and then expect their tasks to be 2× or 100× faster, and they’re just not. It’s Amdahl’s Law: When you make your device n times faster, only the component of response time where that device was actually used will be n times faster (and that’s the optimistic case). So if your total response time is only 5% disk I/O to begin with, upgrading to flash drives will improve performance by no more than 5%.

Hardware upgrades don’t always end well. In Optimizing Oracle Performance, Jeff and I describe a now-classic case in which a hardware upgrade actually slowed down the performance of an important business task. It’s like a UFO sighting. A lot of people have claimed that they’ve experienced it, but having the actual data to prove what really happened…that’s another thing entirely. In our book, we have all the data it takes to show exactly how this kind of catastrophe happens. You can read the whole story here.

Here’s another story for you. A couple of years ago, I was teaching a private course at a company. In the part of the course where I was supposed to have convinced everyone that the Method R way of doing things was worth learning about, I still had a lot of students with crossed arms and body language that was generally saying, “I ain’t buying it.”

So I suggested that we should work on an actual problem that the company had recently experienced. That way, I could show off the method in a context that was already familiar to them. They agreed.

They told me that they had recently executed a half-million-dollar hardware upgrade that had doubled a system’s number of CPUs from 20 to 40. But the system throughput was just as lousy on the 40-CPU box as it had been on the 20-CPU box. They wanted to know what could possibly cause this application not to scale up at all in spite of the much greater capacity.

They had actually printed the 7MB trace file for an important task that had taken about 100 seconds to run. The document was about two inches think. They motioned in frustration at this stack of paper and explained that they had already run tkprof on the file, and that hadn’t helped their understanding at all.

The task behind the trace file was a batch job that comprised nearly the total workload on this upgraded machine. They had a 40-CPU machine with 300 concurrent copies of this batch job that they had traced running on it.

I ran our Profiler product on the trace data. The total run time for the Profiler itself was about 7 seconds, which was good given that all eyes were on me while it was running. The output was most excellent.

The #1 response time contributor: unaccounted-for time, almost 60%. I expected that before I even looked at the profile. That’s what you’ll get when you run 300 concurrent batch jobs on a 40-CPU machine. (I wrote about this phenomenon many years ago in a paper called “Batch queue management and the magic of ‘2’.”) CPU service for database calls was #2, at about 13%. I expected that too. Sleeps for “library cache” latch acquisitions was #3, at about 10%. There was more; this was enough.

So I looked at the task’s top time-consuming SQL, and it looked pretty good to me. Only five LIO calls per row manipulated per joined table, that kind of thing.

Their performance problem wasn’t bad SQL or a lack of horsepower, it was an inevitability of how the application was designed. Every one of these 300 identical processes was parsing like crazy inside of Pro*C loops. It looked like tens of thousands of prepare calls from these 300 processes were occurring every minute, because I saw hundreds of them per few-second interval in each of the 300 concurrent jobs.

The prepare calls were all blocking all over each other because they were all trying to prepare the same exact SQL query, all in different sessions, all at the same time. Of course, the one hash chain in the library cache was getting scanned over and over again, and those scans are protected by the chain’s “library cache” latch. On top of all the latch waiting was all the CPU service time consumed by the spinning for latches. And on top of that was the waiting in a CPU-preempted state for CPU service that the majority of the 300 concurrent jobs were spending the majority of their time doing.

This is an example of how a hardware upgrade cannot improve performance of an application whose bottleneck is software serialization. The remedy: stop making so many OCIStmtPrepare calls from the application. At least keep cursors open for reuse within a single session. Better yet, use a three-tier transaction monitor architecture that will allow several processes to share a single cursor. This application should have parsed each of its SQL statements once per instance lifetime, not thousands of times per session lifetime.

What’s next for Method R?

Well, Method R is not just a method anymore, it’s a whole new company now. I formed Method R Corporation earlier this year, and I have some fantastic people with me. We can help you if you have needs pertaining in any way to performance of Oracle-based applications.

A major opportunity for us right now is that Method R has enormous untapped potential in preventive contexts. Today I think that people view Method R dominantly as a performance problem diagnostic method. That’s fair, because that’s what the book is about, and I think that’s how most people who know about Method R have come to know about it.

But the method works for preventing performance problems from ever occurring in the first place. Application developers drastically reduce their performance problem vulnerability when they regularly profile their code at each stage of promotion, from development through unit testing, performance readiness testing, and into production. Profiles—expressed plainly in seconds of response time—should be the standard language that architects, developers, database and system administrators, end users, and managers use to discuss application performance.

As the founder of a new business, I understand the natural desire that a business leader has to outsource any service that’s not part of his core business. It’s why my company outsources our HR and payroll functions to a PEO. It’s why we buy Gmail services from Google and accounting software services from QuickBooks Online. Oracle application performance is the same way. Why on earth would an airline or an entertainment company or a bank or a telco want to learn everything that Method R Corporation knows about Oracle performance?

What I think people want is the luxury of forgetting about software performance. At Method R Corporation, we can provide that luxury. We can track the performance of your applications, the way your users perceive them, through a service that you don’t have to think about all the time. We can let you know in advance when your performance is trending toward danger. We have written the software that makes it really easy for us to do that. Of course, when we help you find (or predict) a problem, we can help you fix it, or if you prefer, we can teach you how to fix it yourself. The goal is the same in either case: genuinely satisfying software performance for the users of your system. That’s what’s next for Method R.

Cars need regular maintenance. Ships need regular maintenance. Planes need regular maintenance. Don’t databases need regular maintenance?

They do, but the analogy is a little more twisty than you might think. Physical machines have parts that wear out. Software doesn’t wear out, but from time to time it can need a human hand to help it adapt to the changes that periodically intrude upon it.

Your database changes all the time. Every insert, update, and delete changes the world in which your database server must function. With Oracle software, you do need to perform some types of regular maintenance in response to those changes.

One type of regular maintenance that databases need is software patching. No software that is as complex as a database manager is perfect; there are defects inside. As your software manufacturer finds and fixes those defects, you’re going to benefit from some of the patches that result.

Another type of regular maintenance that some databases need is gathering of statistics that the optimizer uses to make its decisions about query plan assignments. That gathering process is not yet completely foolproof, and so it requires some human decision-making and periodic attention.

One topic of regular maintenance that I should mention specifically because there is so much discussion about it out there is index rebuilding. That’s an area where a lot of people think their database needs regular maintenance, but it really doesn’t. Richard Foote, Jonathan Lewis, and Tom Kyte have some excellent articles explaining why.

Over time, as Oracle adds more features to the database software, the requirement for more and more kinds of regular maintenance will diminish, and probably eventually disappear. An easy example from history is Oracle’s approach to undo storage. In version 6, I spent a lot of my time as a DBA rebuilding people’s rollback segments. That kind of thing is no longer necessary, because with rollback segments, Oracle Corporation has evolved its software to eliminate the need for so much human intervention.

Help! My database is slow!

“Help! My database is slow!” There’s a secret behind this statement, and it’s monstrously important. Do you know what it is?

It’s this: When a user says a word like database or system, she’s not talking about a database or your system. She’s talking about a task.

Here’s the problem: To a DBA or analyst, a database is a complicated network of interrelated components. The most common metaphor for organizing those components into something comprehensible is the “performance dashboard,” which technical people try to rely on especially heavily when they hear the words database and slow in the same sentence. Dashboards show all sorts of statistics about your system, like CPU utilization, I/O latencies, cache hit ratios, latch miss ratios, and redo generation rates.

Analysts think about this stuff all the time. But users never do.

When Nancy in AP says, “My database is slow,” what she really means is, “When I click here, it takes too long before I get what I need.”

The first step for diagnosing Nancy’s problem is obvious to almost anybody who hasn’t benefited from years of classical Oracle training: Observe Nancy while she executes the task she’s complaining about. Classically trained Oracle analysts tend to consult their dashboards.

Now, maybe something on your dashboard will light up when Nancy does the slow thing. And maybe that light will actually reveal the cause of Nancy’s problem. But sometimes nothing lights up when she does the slow thing, and sometimes the wrong thing lights up when she does the slow thing. In Optimizing Oracle Performance, there’s a story on page 327 of one time when the wrong thing lit up.

Cases of wrong things (or no things) lighting up on dashboards happen more often than you think. I can’t quantify exactly how often, but I can tell you that nearly every dollar I’ve earned since 1995 is the result of people who went the dashboard route and failed. One message I have for those who will listen is that they should stop looking at dashboards.

So, what should you do instead? As I mentioned, you should look at Nancy. In the case where I met my real-life Nancy, we fixed her problem without ever touching a system. It was one of those, “Ah! Don’t do it that way—do it this way” problems. Of course, more often, there’s something legitimately wrong with how the system responds to Nancy’s click, so you’ll need more data to figure out what it is.

The data that you need is profile data: the kind of stuff C programmers get when they use “gcc -pg,” and what you get when you use Oracle’s “10046 Level 12” trace facility. Raw profile data is a detailed log of how code path has consumed time. You usually don’t look directly at raw profile data; you usually look at it through software called a profiler (like “gprof” and my company’s Profiler product), which filters and aggregates profile data in useful ways to help you make quick sense of what took so long.

The thing that surprises most database professionals about profile data is that you really don’t need anything else to solve the vast majority of performance problems. The trick about profiling is that some applications make it difficult to generate good profile data. Oracle makes it easier with every database release, but an application’s ability to generate good profile data about itself—its performance instrumentation—is a key feature that determines how easy it’s going to be to make your “database” fast for all of your users.

Does the optimizer need a hint?

I like Tom Kyte’s idea that there are good Hints and there are bad Hints. Good Hints, like FIRST_ROWS and ALL_ROWS, give the Optimizer additional information about your intentions that can help it to make better decisions. Bad Hints, like USE_NL and USE_HASH, constrain the Optimizer from choosing an execution plan that might actually be ideal for you.

Bad Hints are like morphine for your database. They’re just right for a very specific purpose, but if your application needs them to run, it’s a problem.

Bad Hints are just right for experimenting with your SQL to see how different plans perform. Hints let you tell the Optimizer exactly what to do, so you can measure how it acts when it does it.

But habitual bad-Hint use is a bad thing. Especially because since version 9.2, the Oracle Optimizer generally makes excellent decisions based upon the information you (or your DBA) give it. Having the Optimizer is a lot like having a really smart SQL performance expert in your office.

And here’s where I think a lot of people mess up. Imagine that a really smart SQL performance expert is saying that your query’s most efficient execution plan is something you find utterly ridiculous. What would you do? If it really were a smart person in your office, you might at least listen respectfully. But with Oracle, a lot of people just roll their eyes and slam a Hint onto their SQL to constrain the Optimizer from choosing the apparently dumb plan.

The problem is that the Optimizer probably made a smart decision based on the data you gave it. Maybe it chose a stupid plan because you accidentally told it that your 10,000,000-row table has only 20 rows in it. If you just tape your Optimizer’s mouth shut with an INDEX Hint, you may never find the 26 other queries that also use bad plans because of this bad assumption.

So when your Optimizer does something crazy, don’t reach for the Hints; find out why a good decision-maker has made a bad decision. The less you can rely on bad Hints, the less time and effort you’ll have to spend hand-tuning individual SQL statements, and the better your odds will be of having stable performance after your next database upgrade.

You can cure your addiction to bad Hints. Start by visiting the Asktom URL that I listed earlier. The full prescription is to read Jonathan Lewis’s outstanding book Cost-Based Oracle—Fundamentals. It covers a tremendous range of information that will help make the Oracle Optimizer your friend. ▲

Cary Millsap is the founder and president of Method R Corporation, a company devoted to genuinely satisfying software performance. He is widely known in the Oracle community as a speaker, educator, consultant, and writer. He is the author (with Jeff Holt) of Optimizing Oracle Performance, for which he and Jeff were named Oracle Magazine’s 2004 Authors of the Year. He is also a contributor to Oracle Insights: Tales of the Oak Table. Cary is the former vice president of Oracle’s System Performance Group, and a co-founder of Hotsos. Cary is also an Oracle ACE Director and a founding partner of the Oak Table Network, an informal association of Oracle scientists that are well known throughout the Oracle community.

Interview conducted by Iggy Fernandez for the August 2008 issue of the NoCOUG Journal (Click here for the PDF version)

Subscribe to this blog by Email

Spotlight on Oracle: An Interview with Bert Scalzo

Bert Scalzo has worked with Oracle databases for well over two decades, starting with version 4. He has an extensive academic background, including a PhD in computer science. He has written or co-authored six books on Oracle: Oracle DBA Guide to Data Warehousing and Star Schemas, TOAD Handbook, TOAD Pocket Reference, Database Benchmarking: Practical Methods for Oracle & SQL Server, Advanced Oracle Utilities: The Definitive Reference and Oracle on VMware: Expert Tips for Database Virtualization.

Why Oracle? Enterprise Edition with Partitioning now costs $59,000 per CPU. The annual support fee is 22% of the purchase price. MySQL has partitioning and it’s free.

I always answer this question using a hopefully clever wordplay of an old Saturday Night Live skit: “Oracle been very, very good to me.” I chose databases and Oracle way back, and mostly stick with what I think I know. I’ve presented papers at the MySQL conference—but until their SQL optimizer matures, I’ll stick with Oracle’s proven track record, maturity, and scalability. But I’m watching MySQL, SQL Server and even PostgreSQL—because things can always change.

My experience with RAC has been bumpy. Is it just me? Is RAC a bane or a boon?

RAC is a great technology. But RAC requires teaching us “old dogs” some new tricks. We cannot simply apply SMP-based Oracle rules of thumb to RAC and expect the same results. I’ve seen numerous RAC “proof of concept” tests fail from this, and thus people sticking with their comfort zone. But I’ve yet to see a successful RAC deployment need to revert. Done right, RAC rocks. Yes it’s more complex and requires mastering new things. Just think of it as job security. I do, however, recommend Quest Software’s Spotlight for RAC, because it makes RAC architecture visualization and the optimization identification process more straightforward.

A company I know has been using Oracle 10g for a long time. Is it time for them to upgrade to Oracle Database 11g? Their database is very stable and performance is rock solid. They don’t have a support contract anymore and don’t seem to need one. The application was written in-house.

That’s a tough question. The technologist in me wants to say 11g—because I like to stay current. The pragmatic consultant in me says don’t fix what ain’t broke. But I never advise anyone to skimp on Oracle support—and no, I’m not an Oracle stockholder. MetaLink access alone can save many hours’ time and real dollars. While Internet searches are very powerful and can find useful stuff—critical wisdom and knowledge resides exclusively within MetaLink. It’s worth every penny. Furthermore, as long as Oracle offers new patch sets for your database, it’s worth being able to evaluate and deploy them.

In your latest book, Oracle on VMware, you make a case for “solving” performance problems with hardware upgrades. Is Oracle too hard to tune, then?

No—Oracle is a great database with extensive flexibility. Instead, it’s the cost of tuning (whether for database, operating system, and/or application) that has become somewhat cost ineffective. Let me give an example. A decade ago, tuning the SGA memory allocation and usage was critical—and it still is today. But with dirt-cheap memory, servers that can accommodate tons of memory, and Oracle automatic memory management—just buy more memory. It’s often far cheaper to add 16 GB RAM than pay two days’ expert database consultation fees. Yes, there are some scenarios where simply adding hardware might make a database bottleneck situation worse. But the odds and cost are now such that it’s a worthwhile first step—since you can always tune if that fails. I’m just preaching to try cheap and easy solutions first.

I make a good living adding hints to Oracle queries to improve performance. Does the optimizer really need a hint or is it something the developer or DBA is doing wrong in the first place?

That’s a loaded question. People generally fall into one of two camps: hints are okay and hints are the devil’s spawn. So picking one side or the other means 50% of the people will disagree with me, and trying to find a happy middle ground will mean 100% of the people think I’m nuts. Personally, I’m not a huge fan of hints—I’d rather rely on 11g’s outstanding plan management features (profiles and baselines). For me, even 10g’s stored outlines are often preferable. But I nonetheless often have to use hints to find the needle in the SQL optimization haystack for whatever reason. When I do, I rely on Quest Software’s SQL Optimizer. Okay, now I sound like a sales guy—I’ve mentioned two products from the company I work for. But we used to do a booth drawing where we displayed a SQL statement on a banner and asked people to guess how many rewrites could be done. No ones’ guesses ever came close to the 39,000 actual rewrites possible. Any tool that helps to generate and filter through all those SQL coding possibilities and performance ramifications is worth a look. It even suggests structural changes to further extend that research. Do note that I did not mention my tool—Toad. So while I may have mentioned some Quest tools, I did not mention my favorite and my baby. I hope that this helps to retain some of my credibility!

Is 24×7 really possible? Personally, I’ve never seen a site that was truly 24×7, and I’m inclined to believe that it’s a myth. Is MAA the answer? If it is, it sure looks expensive. Is there a cheaper alternative?

This is an easy question—heck, no. Nothing is life is absolute. There are few universal truths. So 24×7 is not possible—not even with RAC, Data Guard, and redundant hardware. Remember, the Titanic was unsinkable—and look what publicizing that got them. But we technologists work for the business people—they are our customers. Bear in mind that we (technologists) are just overhead. The customer (who is always right) says they are a 24×7 business. So we have to find an SLA that can satisfy their demands/needs. Look at Internet hosting companies. They generally advertise something like 98.5% uptime. That’s what we too should strive to offer. Then based on their reply, we know what hardware, software, Oracle options, etc., we need to deploy to meet those requirements. Then monitor and improve as needed.

My manager keeps nagging me to get certified. Certification requires the purchase of at least one instructor-led training course. For the price of a five-day training course, I can buy dozens and dozens of good books. I tried to convince my manager to give me a week off to spend at the beach, reading a good book like Effective Oracle by Design by Tom Kyte, but he wouldn’t bite. My manager doesn’t take my opinion seriously but he’ll listen to you. Is he right about this or am I?

If you look at my resume, I don’t really have any current Oracle certifications. So my answer may seem like sour grapes—but I’m not a fan of certifications. When I got my vision corrected (i.e., Intra-Lasik), I looked for someone who had done it a lot and that pro athletes and other doctors went to. Thus my personal selection criteria were competence and acknowledged experience—not cost or academic credentials. I believe the same in our industry. I have a PhD, but that means nothing to people who hire me. It’s what I’ve done and what I know that counts—not the paper on the wall. But meet your manager halfway, agree to attend the next expert training session for certification held in Hawaii or on a cruise ship. Don’t laugh, such events do exist. As for Tom Kyte, you are spot on—his books and blogs are pure gold.

Thanks for answering all my cheeky questions today. I’m always interested in buying a good Oracle book. Do you have some personal favorites that you can recommend?

There are tons of people writing Oracle books these days—and four whom I’ve worked with personally and respect (i.e., I learned a lot). For anything PL/SQL related, Steven Feuerstein is the man. For general DBA texts, Mike Ault is a safe bet. For RAC specific, Murali Vallath is second to none. And for anything performance or tuning related I look to Guy Harrison. In fact, Guy has a new book coming out next year that will become the best single reference book for any Oracle professional. I also like Cary Millsap—especially for his Method R trace file tuning approach. But I’ve not had the privilege of working with Cary; I’ve just attended his sessions and classes, and have had numerous lunches with him. There are lots of other great Oracle authors—so don’t restrict yourself to my short list.

Interview conducted by Iggy Fernandez for the February 2009 issue of the NoCOUG Journal (Click here for the PDF version)

Subscribe to this blog by Email

Fresh Perspectives: An Interview with Karen Morton

August 16, 2009 3 comments

For over 20 years, Karen Morton has worked in information technology, starting out as a mainframe programmer and then growing and changing with the times to do development, DBA, and data architect work. She has been using Oracle since the early 1990s and began teaching others how to use Oracle over a decade ago. She has authored several Oracle training course books, including leading the development of the Hotsos and Method R SQL Optimization curriculum. Karen is an Oracle ACE and a member of the OakTable Network. She blogs at

Why Oracle? Enterprise Edition with Partitioning now costs $59,000 per CPU. The annual support fee is 22% of the purchase price. MySQL has partitioning and the price is right: it’s free.

Oracle has led the industry for the past two decades and I don’t see that changing. My world revolves around Oracle, so I can’t really speak to the differences with MySQL as far as functionality, reliability, scalability, and performance. Yes, Oracle is expensive. Is it worth the cost to your business? Only your business can decide that. I hear the old adage “you get what you pay for” floating through my mind as I write this. Honestly, I think that the cost of something (i.e., higher cost equates to better) as a measure of its ability to satisfy your business needs is less true today than it has ever been. Open source products have made big gains in the market and they do that for a reason: they meet the demands of the businesses that implement them, and they have an attractive TCO (total cost of ownership).

While I think Oracle is the winner for the large enterprise environment, I think it is only prudent that each business evaluate their needs and their cost limitations and choose what best fits for them. I also think that if open source vendors continue to improve, Oracle may one day have to seriously consider pricing model changes in order to maintain their market share. As the gap between free or very low-cost open source databases and Oracle closes, cost may win more often than not.

A company I know has been using Oracle 9i for a long time. Is it time for them to upgrade to Oracle Database 11g? What are the risks? What are the rewards? Their database is very stable and performance is rock solid. They don’t have a support contract anymore, don’t seem to need one, and can’t afford to pay for one. The application was written in-house.

Well, I think you answered the question for me by saying they don’t have a support contract, don’t seem to need one and can’t afford one even if they did. In this case, it sounds like a case of “if it ain’t broke, don’t fix it.” They have stable performance using what they’ve got and no apparent need for any features 11g might provide. Until they can justify a need, then they’re doing the best they can for their business by staying with 9i. The risks of upgrading, particularly the possibility of execution plan instability that might cause performance issues, isn’t worth it when they are happy and rock solid where they are.

We spend a lot of time tuning databases. Why does it have to be that way? Is Oracle difficult to tune? Are we building the databases or applications wrong?

The whole idea that tuning is difficult is a very pervasive belief. Many people think it is more magic than science and believe if you don’t know how to tweak some 200-plus instance parameters in just the right way, you’re doomed to failure (which translates to having to hire an expensive consultant to tune things for you). In a 2003 Oracle white paper entitled “The Self-Managing Database: Automatic Performance Diagnosis,” authors Graham Wood and Kyle Hailey state that one of the main reasons the many advisors were developed starting in 10g was to prove that tuning was in large part about following a logical, methodical approach. And since computers were pretty good at doing just that, the goal of the advisors was to simplify the process of diagnosing problems and make it seem less difficult.

So, we’ve now got access to a set of advisors that can help diagnose problems quicker and easier than ever before. However, I think that when problems do occur, the best defense is a good offense. In other words, we (the DBAs and developers who manage the database and write applications) must have the knowledge and skills to know how to quickly and efficiently diagnose problems when they occur. Gaining the knowledge and skills will also help ensure that we don’t make many of the big performance mistakes to start with. I really believe that knowledge is power and that tuning doesn’t have to be as hard as it is often made out to be.

In his latest book, Oracle on VMware, Dr. Bert Scalzo makes a case for “solving” performance problems with hardware upgrades. What is your opinion about this approach?

If you have performance problems that have been proven, through careful analysis, to have their root cause in substandard hardware, then the solution may be to upgrade. The investment made to make sure personnel are adequately trained to properly diagnose and repair performance problems can benefit your company in the long run much more so than taking what appears to be a quick fix by throwing hardware at the problem in the short run.

I think that when personnel are properly trained and knowledgeable, they will not only be able to quickly, efficiently, and cost effectively find and fix problems when they arise, but they will not make the mistakes that cause the problems in the first place.

I believe the truth is that if hardware isn’t your real problem, you’ll eventually have no other choice than to address the actual root cause, because hardware upgrades can only be a Band-aid on the wound for so long. I’ve seen too many people try this approach only to pay hundreds of thousands of dollars for hardware they don’t really even need and still have performance problems. When they finally do address the real root cause and fix it, they often find that they could have bypassed the entire expenditure on hardware and saved more time and money if they’d just done it right in the first place.

I make a very good living adding hints to Oracle queries to improve performance. Does the optimizer really need a hint or is it something the developer or DBA is doing wrong in the first place?

I think hints are phenomenal testing tools and are sometimes the only viable avenue for getting a query to perform as you desire. But I also think that if you understand the Oracle optimizer and why it makes choices like it does, then you can often find a way to rewrite queries to get the results you want without a hint.

So no, I don’t think the optimizer needs a hint as a rule. The bottom line is that you have to do what is necessary to get the performance you need. My first goal would be to do everything I could with my SQL to give the optimizer the best chance possible to get it right without a hint. If I still can’t get the execution plan I want/need, then I’ll use a hint. However, I’ll document what I believe is the reason why the hint is needed so that its use will be understood by those that follow me.

My manager keeps nagging me to get certified. Certification requires purchase of at least one instructor-led training course. For the price of a five-day training course, I can buy dozens and dozens of good books. I tried to persuade my manager to give me a week off to spend at the beach, reading a good book like Effective Oracle by Design by Tom Kyte, but he wouldn’t bite and keeps nagging me about certification. My manager doesn’t take my opinion seriously but he’ll listen to you. Is he right about this or am I?

Personally, I think certifications can only effectively answer one question: Can a person study and memorize well enough to pass the required tests? There are people out there who attend some kind of certification boot camp for a couple of weeks and take all the tests and pass. There are others who buy exam study guides and take sample tests until they basically have most everything they’ll need to know memorized so they can pass the tests. And there are people out there who have years of experience that don’t even pick up a book or spend a very small amount of time to prep that pass the tests.

I really don’t think certification is a measure of experience or ability. I wonder which employee a manager really wants managing their database: one with a certification but no proven experience or one without a certification but years of experience and a great track record.

If a manager is willing to pay for your certification, get it. If having a certification is the difference between you being chosen for a job over someone else equally qualified, then get it. But I think it’s a mistake to use certification as a measuring stick for experience or an indicator of someone’s ability to handle real-world, tough situations. Certification may get you in the door, but it’s how you apply the knowledge you have and your ability to quickly diagnose and repair problems that will keep you there.

Will there be any jobs left for me in five years? Should I move to Bangalore? Are the Indian shops any good? How should I adapt? One of my previous employers suffered wave after wave of layoffs. I personally know many IT professionals who have switched careers: a PeopleSoft engineer became a police officer, a project manager became an insurance agent. Should I plan on a career change? I’ve always wanted to be a beekeeper.

My experience has been that outsourcing may provide a low-cost way to quickly build an application, but it is not a solution for optimally performing and easily maintainable applications for the long haul. IT staff will still be needed here in the database trenches, where they will have to provide on-going real-time support for applications. Since lower costs and speed of delivery are often the only criteria placed upon foreign outsourced vendors, that leaves lots of room for in-house staff to be needed to monitor, fix, and improve what arrives from these outsourced shops.

What that means to us is that there may be fewer jobs available in the short term at least. So, it becomes imperative that people are on top of their game as far as their knowledge and skill sets are concerned. I know of one company who outsourced their entire development staff and sent the work overseas. They had more performance problems than they could count, very disgruntled users, and ended up spending more time and money in the end. Within one year, they’d hired back about 50% of the original staff to try and fix the work that was delivered from the overseas vendor. Within two years, they’d hired back 95% of the original staff (in terms of the number of developers). They found that if they’d just stayed fully staffed to start with, they would’ve actually saved more money!

I don’t think anyone can make a blanket statement about the quality, or lack thereof, of outsourcing. But, I believe that well-trained in-house personnel can get the job done better and with less need for re-work than people who are thousands of miles away with no personal stake in the company that hires them.

Personally, I think if you love what you do and enjoy being good at it, then it’s not time for a job change. It’s time to work harder, learn more, and prove your value to your employer. But, if you’re not doing something that makes you glad to get up every day and go to work, then that’s the best reason to change jobs that I know of. Perhaps for those who aren’t in IT because they love it, when times are tight and lay-offs occur, it may be time for them to re-evaluate where their heart is and move on to something else.

And, by the way, a friend of mine was a beekeeper and that’s one job I can easily say that I’d never want to have! I’m terrified of (not to mention allergic to) the little creatures!

It costs us almost $30,000 per year to produce and distribute the Journal and a little more than that to organize four conferences. We have about 500 members and a little less than 200 attendees at each conference. Our membership levels have been stagnant for a long time. Has Google made us obsolete? I begged my previous company to buy a corporate membership but, except for my manager, not a single person from that company ever showed up at our conferences. Should we close shop? What could we do better to attract more members?

I think that user groups play a very important part in the Oracle community. With economic conditions being what they are now, many companies want to keep spending on continuing education to a minimum. User groups like NoCOUG provide that local lower-cost option and not only help educate but provide great networking opportunities as well. I certainly don’t think they are becoming obsolete. I think they’re more important than ever. The issue is making sure people understand the value they get from having a good user group in their area.

First, there are cost-cutting strategies to help the organization be viable without need for as much financial backing. I’ve seen some user groups across the country reduce quarterly training day events down to just once or twice a year. I’ve seen reductions in general meetings from once a month to once a quarter. I’ve seen printed newsletters and periodicals converted to on-line formats. I’ve seen short online training events replace on-site training events. Google hasn’t made user groups obsolete, but Google has shown us the power of getting quick access to information online and in a hurry.

Second, it’s my belief that people are always willing to pay for things they believe provide meaningful value. Even when you can get some information online, the quality of that information isn’t guaranteed. One of the great things about user groups is that they provide a place to get good information from reputable sources. Even when the user group doesn’t sponsor an in-person event with a speaker of note, the group can post links to qualified information on the Web and save members from having to weed out a ton of questionable content and get them straight to the good stuff.

Third, meeting with people who share common experiences with you is one of the best reasons to belong to a user group, in my opinion. It gives you an opportunity to learn from others who have already faced problems you’re currently mired in and to be able to leverage their experience to help you find solutions for you. And, as we’ve already discussed, many of us are facing layoffs and outsourcing trends that make our jobs less secure than we once thought. Having a community of colleagues that can help you in searching out new job opportunities is critical, and user group connections can often provide the lead to your next job when you need it.

In the end it’s all about giving people what they want and what they need in the most convenient and low-cost way possible. I certainly hope this is the time when user groups really rise and shine versus decline. There are few other opportunities to get so much for so little than what can be gained through a great local user group like NoCOUG.

Thanks for answering all my cheeky questions today. I’m always interested in buying good Oracle books. Do you have some personal favorites that you can recommend?

Cost-Based Oracle FundamentalsJonathan Lewis.

Troubleshooting Oracle PerformanceChristian Antognini.

The Art of SQLStephane Faroult.

Optimizing Oracle PerformanceCary Millsap and Jeff Holt.

Anything and everything from Tom Kyte.

I’ve got a huge library of books that I’ve really found to be helpful, but these are the first few that come to mind.

Interview conducted by Iggy Fernandez for the May 2009 issue of the NoCOUG Journal (Click here for the PDF version)

Note: Here’s the full quote from Dr. Scalzo’s book: “Person hours cost so much more now than computer hardware even with inexpensive offshore outsourcing. It is now considered a sound business decision these days to throw cheap hardware at problems. It is at least, if not more, cost effective than having the staff [sic] tuned and optimized for the same net effect. Besides, a failed tuning and optimization effort leaves you exactly where you started. At least the hardware upgrade approach results in a faster/better server experiencing the same problem that may still have future value to the business once the fundamental problem is eventually corrected. And, if nothing else, the hardware can be depreciated, whereas the time spent tuning is always just a cost taken off the bottom line. So, with such cheap hardware, it might be a wiser business bet to throw hardware at some solutions sooner than was done in the past. One might go so far as to make an economic principle claim that the opportunity cost of tuning is foregoing cheap upgrades that might fix the issue and also possess intrinsic value. Stated this way, it is a safe bet that is where the business people would vote to spend.”

Subscribe to this blog by Email

%d bloggers like this: