Archive

Archive for the ‘Interviews’ Category

Be Very Afraid: An interview with the CTO of database security at McAfee

Security should be on everybody’s mind after the revelations of Chrome’s password insanity and the claim that cloud computing companies may lose as much as $35 billion dollars because of the intelligence-gathering activities of our government. According to Slavik Markovich, the CTO of database security at McAfee, stupidity is the usual reason for security holes and our operating assumption should be that all our online actions are being recorded and monitored by a host of phantom listeners. They know that I published this blog post and they know that you have read it. They know everything about you and me.

Down in the street little eddies of wind were whirling dust and torn paper into spirals, and though the sun was shining and the sky a harsh blue, there seemed to be no color in anything except the posters that were plastered everywhere. The black-mustachio’d face gazed down from every commanding corner. There was one on the house front immediately opposite. BIG BROTHER IS WATCHING YOU, the caption said, while the dark eyes looked deep into Winston’s own. … Behind Winston’s back the voice from the telescreen was still babbling away about pig iron and the overfulfillment of the Ninth Three-Year Plan. The telescreen received and transmitted simultaneously. Any sound that Winston made, above the level of a very low whisper, would be picked up by it; moreover, so long as he remained within the field of vision which the metal plaque commanded, he could be seen as well as heard. There was of course no way of knowing whether you were being watched at any given moment. How often, or on what system, the Thought Police plugged in on any individual wire was guesswork. It was even conceivable that they watched everybody all the time. But at any rate they could plug in your wire whenever they wanted to. You had to live—did live, from habit that became instinct—in the assumption that every sound you made was overheard, and except in darkness, every movement scrutinized.1984 by George Orwell

As published in the 105th issue of the NoCOUG Journal (February 2013)

Slavik Markovich is vice president and chief technology officer for Database Security at McAfee and has over 20 years of experience in infrastructure, security, and software development. Slavik co-founded Sentrigo, a developer of leading database security technology that was acquired by McAfee in April 2011. Prior to co-founding Sentrigo, Slavik served as VP R&D and chief architect at db@net, a leading IT architecture consultancy. Slavik has contributed to open-source projects, is a regular speaker at industry conferences, and is the creator of several open-source projects like FuzzOr (an Oracle fuzzer) and YAOPC (Yet Another Oracle Password Cracker). Slavik also regularly blogs about database security at www.slaviks-blog.com.

Is my financial and medical information safe from the bad guys? After watching Die Hard 4, I’m not so sure, because it seems that bad guys can access, change, or erase anybody’s information with a few keystrokes.

Although life is not a movie, and the situation is not quite as bad as Die Hard 4, it is not that good either. You can read about breaches with varying degrees of severity every week. While the “bad guys” require a bit more than a few keystrokes to access/change information, they have very sophisticated tools at their service. World-spanning global botnets, automated hacking tools, a flourishing underground market, and a strong financial incentive all motivate the “bad guys” to continue breaking into systems.

On the flipside, there have been many significant changes and improvements to the applicable regulations associated with protection of PHI and ePHI healthcare information. In addition, the enhanced enforcement of HIPAA, and the newer HITECH, regulations has increased the visibility of—and, arguably, attention to—affected organizations complying with these regulatory mandates. SOX, GLBA, and other financial regulations are intended to address the integrity and authenticity of financial records. So, the organizations keeping your records are forced to think about security.

I would also add that it isn’t always “the bad guys” that cause data compromise—sometimes it’s caused accidentally, either by human, or system(s), error. To summarize, if you are being targeted, I’d say that there is a pretty good chance that the hackers will succeed in compromising your details. On the other hand, your liability is limited, at least on the financial front.

Why is information security so poor in general? Is it because administrators and users—me included—are clueless about information security, or is it because the operating systems, databases, networks, languages, and protocols are inherently vulnerable, which makes our task much harder than it really ought to be?

Indeed, there is a big awareness issue when it comes to security. Users, developers, and administrators generally lack deep understanding of security and, as everybody knows, security is only as strong as your weakest link. The “bad guy” just needs one successful try on a single attack vector, while the security protections need to cover all bases, all the time. It’s an asymmetric game where currently the “bad guys” have the advantage.

When specifically talking about “database security,” the reality is that the overall risk posture for these systems, and the often highly sensitive and/or business-critical information they contain, is most often grossly underestimated by the respective organizations. A comparison can be made to what the famous 1930s bank robber Willie Sutton was often quoted as saying, when asked by a reporter why he robbed banks: “Because that’s where the money is.” The “bad guys” often target these databases, and the valuable data assets they contain, because they know that’s where they can get the biggest bang for their buck (i.e., the highest return for their exploit efforts).

Also, the associated risk to them of being caught and subsequently penalized is very often quite low combined with the associated payoff (return) being quite high. So from an ROI perspective, their motivating rationale is abundantly clear.

Finally, if you were indeed “clueless” about security, you probably wouldn’t be asking these types of targeted questions.

The analogy is that certain cars are the favorites of car thieves because they are so easy to break into. Why are salted password hashes not the default? Why are buffer overflows permitted? Why was it so easy for China to divert all Internet traffic through its servers for 20 minutes in April 2010? Why is Windows so prone to viruses? Is it a conspiracy?

My motto is “always choose stupidity over conspiracy.” It goes back to the issue of lack of awareness. Developers that are not constantly trained on security will introduce security issues like buffer overflows or passwords stored in clear text or encrypted instead of hashed with a salt, etc. Some protocols were not designed with security in mind, which makes them susceptible to manipulation. Some targets are definitely softer than others.

At an absolute minimum, measures should be taken to harden the respective systems, as per the individual vendors’ guidelines and instructions. Unnecessary system services and processes should be disabled to reduce the attack surface, appropriate access control mechanisms should be properly configured, critical system patching should be done on a regular basis, etc.

But, unfortunately, these minimal security measures are often insufficient to address the rapidly expanding threat landscape. System visibility, in as near real time as possible, is required. Automated user process monitoring, vulnerability assessment, event correlation, and accompanying security policy notifications/alerting for these systems needs to be provided.

Is the cloud safe? Is SaaS safe?

I do not believe that the cloud or the SaaS model is inherently more or less safe—it is just a different kind of safe. Depending on the organizations’ risk appetite, they can be provided with the appropriate safeguards and controls to make implementation of private and public cloud-based services correspondingly “safe.” Technological controls, as well as organizational and administrative controls, need to be tailored for these types of deployments.

It’s also critical that the database security model be extensible and scalable to accommodate virtual and cloud-based environments.

Do we need better laws or should we trust the “enlightened self-interest” of industry? Enlightened self-interest—the mantra of Fed chairman Alan Greenspan—didn’t prevent the financial collapse Will it prevent the digital equivalent of Pearl Harbor?

“Enlightened self-interest,” by itself, is usually insufficient. At least it has been proven to be up to now. On the other hand, over-regulation would not be a good alternative, either. There has to be a happy medium—where government and private industry work together to promote a more secure environment for commercial transactions to occur, and where consumers’ privacy is also protected. But, unfortunately, we’re not there yet.

If not laws, how about some standards? Why aren’t there templates for hardened operating systems, databases, and networks? Or are there?

There are numerous standards for applying security controls to these systems, including Center for Internet Security (CIS), which includes “hardening” benchmarks for a variety of different systems and devices, as well as the NIST 800 Series Special Publications that offer a very large set of documents addressing applicable policies, procedures, and guidelines for information security. In addition, most of the more significant IT product vendors provide specific hardening guidelines and instructions pertaining to their various products.

The problem is how to consistently measure and make sure that your systems do not deviate from the gold standard you set. Unfortunately, systems tend to deteriorate with use—parameters are changed, new credentials and permissions are introduced, etc. An organization without a consistent, proven way to scan systems is going to have issues no matter how close it follows the standards. A recent scan we did with a large enterprise discovered over 15,000 weak passwords in their databases. In theory, they followed very strict federal policies.

Who will guard the guards themselves? As an administrator, I have unlimited access to sensitive information. How can my employer protect itself from me?

There’s a fundamental tenet in information security called “principle of least privilege,” which basically says that a user should be given the necessary authorization to access the information they need to perform their tasks/job—but no more than that level of privileged access. In addition, there’s another concept called “separation (or “segregation”) of duties,” which states that there should be more than one person required to complete a particular task, in order to help prevent potential error or fraud.

In the context of databases, this translates to not allowing users and administrators to have more access than is required for them to do their jobs—and for DBAs, that the DB administrative tasks will be monitored in real time and supervised by a different team, usually the information security team. A security framework that enforces these database access control policies is critical, because the inconvenient fact is, many compromises of DBs involve privileged access by trusted insiders.

While there is a much higher probability that someone who is not a DBA would try to breach the database, the DBA is in a much better position to succeed should he or she really want to do that.

If risk is the arithmetical product of the probability of an incident happening and the potential damage that incident could cause, then due to the latter factor, DBAs as well as other highly skilled insiders with access privileges pose a significant risk.

In 2007, Computerworld and other sources reported that a senior DBA at a subsidiary of Fidelity National Information Services Inc. sold 8.5 million records, including bank account and credit card details, to a data broker. An external hacker would find it very difficult to achieve this kind of scale without insider cooperation.

It is important, for security as much as for regulatory compliance reasons, to monitor and audit DBA activity. In fact, this should be done for all users who access the database. DBAs are the first to understand this. If you work in a bank vault, you know there are CCTV cameras on you. You want those cameras on you. DBAs are in a similar situation, and they understand this requirement completely.

What DBAs should not accept are solutions that hinder or interfere with the DBA’s daily tasks—DBAs are primarily concerned with running databases efficiently. Any solution that jeopardizes this primary objective is counter-productive and doomed to fail anyway, because DBAs and other staff will find ways to circumvent it.

What DBAs should not accept are solutions that hinder or interfere with the DBA’s daily tasks—DBAs are primarily concerned with running databases efficiently. Any solution that jeopardizes this primary objective is counter-productive and doomed to fail anyway, because DBAs and other staff will find ways to circumvent it.

At the risk of getting lynched by Journal readers, I have to ask your opinion about certification. Information Technology is the only profession whose practitioners are not subject to licensing and certification requirements. Can we really call ourselves “professionals” if we are not subject to any rules? Doesn’t the cost-benefit analysis favor licensing and certification? Even plumbers and manicurists in the state of California are subject to licensing and certification requirements but not IT professionals. Do you advocate security certification?

Well—while there’s certainly value in conducting user security awareness training and in promoting and achieving professional security certification, there are some issues. Like who would the accrediting body be? Who exactly needs to be certified? Will there be different levels of certification? Will each OS, DB, network device, application, etc., require its own distinct cert? It can quickly get very complicated.

But a shorter answer could be yes—I advocate security certifications.

In the novel 1984, George Orwell imagined that a device called a “telescreen” would allow “Big Brother” to listen to everything you said. The reality in 2013 is much worse since so much is digital, including my every message, phone call, and commercial transaction, and the cell phone is everybody’s personal electronic monitoring bracelet. What steps should we take to protect ourselves in this brave new digital world?

One possible answer might depend on how much security an individual is willing to trade for a potential reduction of features and functionality. For example, when “location services” are enabled on your phone, a variety of enhanced proximity-based services are then available, like several kinds of mapping services, driving directions and conditions, identification of nearby retail outlets, restaurants, gas stations, etc.

In addition, you can also locate your phone if it gets lost, wipe it of its contents, and/or have emergency services find you to provide help. But you also potentially get location-based advertisements, and there’s the specter of the device and application vendors (browser and service providers, too) aggregating and mining your various voice/data transmission location(s), for their own commercial purposes. The ongoing “privacy vs. commerce” battles involved in the “Do Not Track” discussions are good examples of these often-conflicting forces.

My personal assumption is that anything I publish on any network (text message, Facebook, Twitter, etc.) is public, no matter what settings it is published with. If I want to keep something private, I encrypt it. But, I’m willing to make privacy sacrifices in the name of convenience. I do use GPS; I do use Facebook and LinkedIn, etc.

Thank you for spending so much time with us today. Would you like to tell Journal readers a little about today’s McAfee? What are your current products? What is in the pipeline?

Well, I’m glad you asked. The McAfee Database Security solution comprises a core set of three products that serve to scan, monitor, and secure databases:

  • McAfee Vulnerability Manager for Databases, which automatically discovers databases on the network, detects sensitive information in them, determines if the latest patches have been applied, and performs more than 4,700 vulnerability checks.
  • McAfee Database Activity Monitoring, which provides automatic, non-intrusive, and real-time protection for heterogeneous database environments on your network with a set of preconfigured security defenses, and also provides the ability to easily create custom security policies based on configurable, and very granular, controls. In addition, it has the capability to deliver virtual patching updates on a regular basis to protect from known vulnerabilities.
  • McAfee Virtual Patching for Databases (vPatch), which protects unpatched databases from known vulnerabilities and all database servers from zero-day attacks based on common threat vectors, without having to take the database offline to patch it. Additionally, vPatch has been accepted as a “compensating control” in compliance audits.

The McAfee Database Security solution is also tightly integrated with McAfee’s centralized security management platform, ePolicy Orchestrator (ePO), which consolidates enterprise-wide security visibility and control across a wide variety of heterogeneous systems, networks, data, and compliance solutions.

At McAfee, we do not believe in a silver bullet product approach. No security measure can protect against all attacks or threats. However, McAfee’s Security Connected framework enables integration of multiple products, services, and partnerships for centralized, efficient, and effective security and risk management. ▲

As published in the 105th issue of the NoCOUG Journal (February 2013)

Categories: DBA, Interviews, Oracle, Security

Be Very Afraid: An interview with the CTO of database security at McAfee

February 13, 2013 Leave a comment

As published in the 105th issue of the NoCOUG Journal (February 2013)

Slavik Markovich is vice president and chief technology officer for Database Security at McAfee and has over 20 years of experience in infrastructure, security, and software development. Slavik co-founded Sentrigo, a developer of leading database security technology that was acquired by McAfee in April 2011. Prior to co-founding Sentrigo, Slavik served as VP R&D and chief architect at db@net, a leading IT architecture consultancy. Slavik has contributed to open-source projects, is a regular speaker at industry conferences, and is the creator of several open-source projects like FuzzOr (an Oracle fuzzer) and YAOPC (Yet Another Oracle Password Cracker). Slavik also regularly blogs about database security at www.slaviks-blog.com.

Down in the street little eddies of wind were whirling dust and torn paper into spirals, and though the sun was shining and the sky a harsh blue, there seemed to be no color in anything except the posters that were plastered everywhere. The black-mustachio’d face gazed down from every commanding corner. There was one on the house front immediately opposite. BIG BROTHER IS WATCHING YOU, the caption said, while the dark eyes looked deep into Winston’s own. … Behind Winston’s back the voice from the telescreen was still babbling away about pig iron and the overfulfillment of the Ninth Three-Year Plan. The telescreen received and transmitted simultaneously. Any sound that Winston made, above the level of a very low whisper, would be picked up by it; moreover, so long as he remained within the field of vision which the metal plaque commanded, he could be seen as well as heard. There was of course no way of knowing whether you were being watched at any given moment. How often, or on what system, the Thought Police plugged in on any individual wire was guesswork. It was even conceivable that they watched everybody all the time. But at any rate they could plug in your wire whenever they wanted to. You had to live—did live, from habit that became instinct—in the assumption that every sound you made was overheard, and except in darkness, every movement scrutinized.1984 by George Orwell

Is my financial and medical information safe from the bad guys? After watching Die Hard 4, I’m not so sure, because it seems that bad guys can access, change, or erase anybody’s information with a few keystrokes.

Although life is not a movie, and the situation is not quite as bad as Die Hard 4, it is not that good either. You can read about breaches with varying degrees of severity every week. While the “bad guys” require a bit more than a few keystrokes to access/change information, they have very sophisticated tools at their service. World-spanning global botnets, automated hacking tools, a flourishing underground market, and a strong financial incentive all motivate the “bad guys” to continue breaking into systems.

On the flipside, there have been many significant changes and improvements to the applicable regulations associated with protection of PHI and ePHI healthcare information. In addition, the enhanced enforcement of HIPAA, and the newer HITECH, regulations has increased the visibility of—and, arguably, attention to—affected organizations complying with these regulatory mandates. SOX, GLBA, and other financial regulations are intended to address the integrity and authenticity of financial records. So, the organizations keeping your records are forced to think about security.

I would also add that it isn’t always “the bad guys” that cause data compromise—sometimes it’s caused accidentally, either by human, or system(s), error. To summarize, if you are being targeted, I’d say that there is a pretty good chance that the hackers will succeed in compromising your details. On the other hand, your liability is limited, at least on the financial front.

Why is information security so poor in general? Is it because administrators and users—me included—are clueless about information security, or is it because the operating systems, databases, networks, languages, and protocols are inherently vulnerable, which makes our task much harder than it really ought to be?

Indeed, there is a big awareness issue when it comes to security. Users, developers, and administrators generally lack deep understanding of security and, as everybody knows, security is only as strong as your weakest link. The “bad guy” just needs one successful try on a single attack vector, while the security protections need to cover all bases, all the time. It’s an asymmetric game where currently the “bad guys” have the advantage.

When specifically talking about “database security,” the reality is that the overall risk posture for these systems, and the often highly sensitive and/or business-critical information they contain, is most often grossly underestimated by the respective organizations. A comparison can be made to what the famous 1930s bank robber Willie Sutton was often quoted as saying, when asked by a reporter why he robbed banks: “Because that’s where the money is.” The “bad guys” often target these databases, and the valuable data assets they contain, because they know that’s where they can get the biggest bang for their buck (i.e., the highest return for their exploit efforts).

Also, the associated risk to them of being caught and subsequently penalized is very often quite low combined with the associated payoff (return) being quite high. So from an ROI perspective, their motivating rationale is abundantly clear.

Finally, if you were indeed “clueless” about security, you probably wouldn’t be asking these types of targeted questions.

The analogy is that certain cars are the favorites of car thieves because they are so easy to break into. Why are salted password hashes not the default? Why are buffer overflows permitted? Why was it so easy for China to divert all Internet traffic through its servers for 20 minutes in April 2010? Why is Windows so prone to viruses? Is it a conspiracy?

My motto is “always choose stupidity over conspiracy.” It goes back to the issue of lack of awareness. Developers that are not constantly trained on security will introduce security issues like buffer overflows or passwords stored in clear text or encrypted instead of hashed with a salt, etc. Some protocols were not designed with security in mind, which makes them susceptible to manipulation. Some targets are definitely softer than others.

At an absolute minimum, measures should be taken to harden the respective systems, as per the individual vendors’ guidelines and instructions. Unnecessary system services and processes should be disabled to reduce the attack surface, appropriate access control mechanisms should be properly configured, critical system patching should be done on a regular basis, etc.

But, unfortunately, these minimal security measures are often insufficient to address the rapidly expanding threat landscape. System visibility, in as near real time as possible, is required. Automated user process monitoring, vulnerability assessment, event correlation, and accompanying security policy notifications/alerting for these systems needs to be provided.

Is the cloud safe? Is SaaS safe?

I do not believe that the cloud or the SaaS model is inherently more or less safe—it is just a different kind of safe. Depending on the organizations’ risk appetite, they can be provided with the appropriate safeguards and controls to make implementation of private and public cloud-based services correspondingly “safe.” Technological controls, as well as organizational and administrative controls, need to be tailored for these types of deployments.

It’s also critical that the database security model be extensible and scalable to accommodate virtual and cloud-based environments.

Do we need better laws or should we trust the “enlightened self-interest” of industry? Enlightened self-interest—the mantra of Fed chairman Alan Greenspan—didn’t prevent the financial collapse Will it prevent the digital equivalent of Pearl Harbor?

“Enlightened self-interest,” by itself, is usually insufficient. At least it has been proven to be up to now. On the other hand, over-regulation would not be a good alternative, either. There has to be a happy medium—where government and private industry work together to promote a more secure environment for commercial transactions to occur, and where consumers’ privacy is also protected. But, unfortunately, we’re not there yet.

If not laws, how about some standards? Why aren’t there templates for hardened operating systems, databases, and networks? Or are there?

There are numerous standards for applying security controls to these systems, including Center for Internet Security (CIS), which includes “hardening” benchmarks for a variety of different systems and devices, as well as the NIST 800 Series Special Publications that offer a very large set of documents addressing applicable policies, procedures, and guidelines for information security. In addition, most of the more significant IT product vendors provide specific hardening guidelines and instructions pertaining to their various products.

The problem is how to consistently measure and make sure that your systems do not deviate from the gold standard you set. Unfortunately, systems tend to deteriorate with use—parameters are changed, new credentials and permissions are introduced, etc. An organization without a consistent, proven way to scan systems is going to have issues no matter how close it follows the standards. A recent scan we did with a large enterprise discovered over 15,000 weak passwords in their databases. In theory, they followed very strict federal policies.

Who will guard the guards themselves? As an administrator, I have unlimited access to sensitive information. How can my employer protect itself from me?

There’s a fundamental tenet in information security called “principle of least privilege,” which basically says that a user should be given the necessary authorization to access the information they need to perform their tasks/job—but no more than that level of privileged access. In addition, there’s another concept called “separation (or “segregation”) of duties,” which states that there should be more than one person required to complete a particular task, in order to help prevent potential error or fraud.

In the context of databases, this translates to not allowing users and administrators to have more access than is required for them to do their jobs—and for DBAs, that the DB administrative tasks will be monitored in real time and supervised by a different team, usually the information security team. A security framework that enforces these database access control policies is critical, because the inconvenient fact is, many compromises of DBs involve privileged access by trusted insiders.

While there is a much higher probability that someone who is not a DBA would try to breach the database, the DBA is in a much better position to succeed should he or she really want to do that.

If risk is the arithmetical product of the probability of an incident happening and the potential damage that incident could cause, then due to the latter factor, DBAs as well as other highly skilled insiders with access privileges pose a significant risk.

In 2007, Computerworld and other sources reported that a senior DBA at a subsidiary of Fidelity National Information Services Inc. sold 8.5 million records, including bank account and credit card details, to a data broker. An external hacker would find it very difficult to achieve this kind of scale without insider cooperation.

It is important, for security as much as for regulatory compliance reasons, to monitor and audit DBA activity. In fact, this should be done for all users who access the database. DBAs are the first to understand this. If you work in a bank vault, you know there are CCTV cameras on you. You want those cameras on you. DBAs are in a similar situation, and they understand this requirement completely.

What DBAs should not accept are solutions that hinder or interfere with the DBA’s daily tasks—DBAs are primarily concerned with running databases efficiently. Any solution that jeopardizes this primary objective is counter-productive and doomed to fail anyway, because DBAs and other staff will find ways to circumvent it.

What DBAs should not accept are solutions that hinder or interfere with the DBA’s daily tasks—DBAs are primarily concerned with running databases efficiently. Any solution that jeopardizes this primary objective is counter-productive and doomed to fail anyway, because DBAs and other staff will find ways to circumvent it.

At the risk of getting lynched by Journal readers, I have to ask your opinion about certification. Information Technology is the only profession whose practitioners are not subject to licensing and certification requirements. Can we really call ourselves “professionals” if we are not subject to any rules? Doesn’t the cost-benefit analysis favor licensing and certification? Even plumbers and manicurists in the state of California are subject to licensing and certification requirements but not IT professionals. Do you advocate security certification?

Well—while there’s certainly value in conducting user security awareness training and in promoting and achieving professional security certification, there are some issues. Like who would the accrediting body be? Who exactly needs to be certified? Will there be different levels of certification? Will each OS, DB, network device, application, etc., require its own distinct cert? It can quickly get very complicated.

But a shorter answer could be yes—I advocate security certifications.

In the novel 1984, George Orwell imagined that a device called a “telescreen” would allow “Big Brother” to listen to everything you said. The reality in 2013 is much worse since so much is digital, including my every message, phone call, and commercial transaction, and the cell phone is everybody’s personal electronic monitoring bracelet. What steps should we take to protect ourselves in this brave new digital world?

One possible answer might depend on how much security an individual is willing to trade for a potential reduction of features and functionality. For example, when “location services” are enabled on your phone, a variety of enhanced proximity-based services are then available, like several kinds of mapping services, driving directions and conditions, identification of nearby retail outlets, restaurants, gas stations, etc.

In addition, you can also locate your phone if it gets lost, wipe it of its contents, and/or have emergency services find you to provide help. But you also potentially get location-based advertisements, and there’s the specter of the device and application vendors (browser and service providers, too) aggregating and mining your various voice/data transmission location(s), for their own commercial purposes. The ongoing “privacy vs. commerce” battles involved in the “Do Not Track” discussions are good examples of these often-conflicting forces.

My personal assumption is that anything I publish on any network (text message, Facebook, Twitter, etc.) is public, no matter what settings it is published with. If I want to keep something private, I encrypt it. But, I’m willing to make privacy sacrifices in the name of convenience. I do use GPS; I do use Facebook and LinkedIn, etc.

Thank you for spending so much time with us today. Would you like to tell Journal readers a little about today’s McAfee? What are your current products? What is in the pipeline?

Well, I’m glad you asked. The McAfee Database Security solution comprises a core set of three products that serve to scan, monitor, and secure databases:

  • McAfee Vulnerability Manager for Databases, which automatically discovers databases on the network, detects sensitive information in them, determines if the latest patches have been applied, and performs more than 4,700 vulnerability checks.
  • McAfee Database Activity Monitoring, which provides automatic, non-intrusive, and real-time protection for heterogeneous database environments on your network with a set of preconfigured security defenses, and also provides the ability to easily create custom security policies based on configurable, and very granular, controls. In addition, it has the capability to deliver virtual patching updates on a regular basis to protect from known vulnerabilities.
  • McAfee Virtual Patching for Databases (vPatch), which protects unpatched databases from known vulnerabilities and all database servers from zero-day attacks based on common threat vectors, without having to take the database offline to patch it. Additionally, vPatch has been accepted as a “compensating control” in compliance audits.

The McAfee Database Security solution is also tightly integrated with McAfee’s centralized security management platform, ePolicy Orchestrator (ePO), which consolidates enterprise-wide security visibility and control across a wide variety of heterogeneous systems, networks, data, and compliance solutions.

At McAfee, we do not believe in a silver bullet product approach. No security measure can protect against all attacks or threats. However, McAfee’s Security Connected framework enables integration of multiple products, services, and partnerships for centralized, efficient, and effective security and risk management. ▲

Categories: Interviews, NoCOUG

How Not to Interview a Database Administrator (The Google Way)

July 16, 2012 3 comments

As suggested by the following story, Google would have preferred to hire my teenage daughter as the manager of their database team instead of me. I was on a long drive with my family so—to pass the time—I asked them to solve the problem that the Google interviewer had asked me to solve:

“Four men are on one side of a rickety bridge on a dark night. The bridge is only strong enough to support two men at a time. It is also necessary for the men crossing the bridge to carry a lantern to guide their way, and the four men have only one lantern between them. Andy can cross the bridge in 1 minute, Ben in 2, Charlie in 5, and Dan in ten minutes. How quickly can all four men be together at the other side?”

My daughter’s first solution was identical to mine.

Andy and Ben cross the bridge first. This takes two minutes.
Andy returns with the lantern. This takes one minute.
Andy and Charlie cross the bridge next. This takes five minutes.
Andy returns with the lantern. This takes one minute.
Andy and Dan cross the bridge last. This takes ten minutes.

The total time for the above solution is 19 minutes. However, I had googled the answer after returning from my interview (at Google) and knew that the four men could cross in 17 minutes, so I asked my daughter to try again. She “solved” the problem on her second attempt which suggests that Google would have preferred to hire her as a manager of database administration instead of me. Click here to see the “solution.”

I quoted the words “solved” and “solution” in the above paragraph because we still need a rigorous proof that the above “solution” is in fact the optimal solution; that is, is there a solution that takes less than 17 minutes? Neither does the above “solution” provide any insight into the general case. For example, the above “solution” is not optimal if Ben takes 4 minutes to cross the bridge instead of 2 minutes. The above “solution” needs 23 minutes if Ben takes 4 minutes to cross the bridge but it can be done in 21 minutes. And what if there are more than four people who need to cross? I am willing to bet that my Google interviewer would not have been able to prove the optimality of the above “solution” or solve the general case. If you’re interested, a comprehensive mathematical treatment of the above case as well as the general case can be found in this mathematical paper by Prof. Gunter Rote of the Free University of Berlin.

The Google interview technique is not the best technique for finding the best database administrators (or those with the right aptitude). Please feel free to comment. Is it just a case of sour grapes on my part?

Categories: Career Advice, DBA, Interviews

Show Me the Way—with the innovator behind Statspack and AWR

As published in the 102nd issue of the NoCOUG Journal (February 2012)

Show Me the Way

with Graham Wood

Graham Wood has been working with Oracle Database for 25 years. He is currently a product manager for the Oracle RDBMS based in Redwood Shores, Calif. He has architected and tuned some of the largest Oracle databases, and has presented around the world on Oracle performance–related topics.

I have it on very good authority (Tom Kyte in the current issue of Oracle Magazine) that you are the genius and innovator behind Statspack and Automatic Workload Repository. I am in awe. Tell me the story behind that.

Wow, starting with a memory test! When Oracle V6 was introduced it contained the first V$ views, such as V$SYSSTAT and V$FILESTAT. These structures were created to allow database development to understand which parts of the code were being executed, and how often, during the running of the OLTP benchmarks that had started to appear at that time. The database shipped with two scripts that were used to produce a report from the V$ views during a benchmark run. These were bstat.sql, which captured the current contents of the V$ views at the start of the benchmark into a set of tables, and estat.sql, which captured the contents at the end of the benchmark into another set of tables, produced a report from the two sets of tables, and then dropped them. I was working in a small specialist performance group in Oracle UK at the time and it occurred to us, being database guys, that it might be useful for production systems to do regular captures of the V$ views and to keep this data around for rather longer as a repository of performance data. We wrote some scripts and started to distribute them inside Oracle, and they also found their way out to several customers. This was the original “Stats Package,” as we called it. As new releases of the database came out, I upgraded the scripts, probably most notably with the inclusion of V$SQL in Oracle V7 in the Stats7 package. In 1996 I moved to Oracle HQ in Redwood Shores to work in the Server Technologies Performance Group, and one of the goals that I set myself was to get the scripts shipped with the product so that all customers could use them. They finally made it into the database distribution in Oracle 8i as Statspack after being updated and enhanced by Connie Green. And the rest, as they say, is history, with almost all big Oracle sites using Statspack to keep a history of performance data.

When we started development of Oracle 10g, one of the main focus areas for the release was to be manageability, and a key part of that was to simplify performance analysis and to make recommendations for performance improvement. The most important part of this for me was to be able to automate performance analysis for a database instance and to identify the key areas where improvements could be made. Basically, unless the analysis is correct, there is no point in trying to make recommendations. In order to do this we needed to have a common currency for reporting across the components of the database and for quantifying the performance of a system. This led to the introduction of the concept of DB Time, the time spent in the database by user sessions, which allowed us to do quantitative comparisons between different components and also to quantify the impact on the system of an issue—for example that a single SQL statement represents 27% of all of the user time spent in the database. One of the main objectives of this was to make DBAs more effective by directing them to areas where they were likely to be able to make the greatest improvements in performance, rather than them spending time and effort on making changes that produced little benefit. To do all of this needed much more infrastructure than there was in Statspack and in Oracle 10g, and a lot of effort went into ensuring that we had adequate data available to do analysis of a performance problem the first time that it occurred. This resulted in an automatically managed repository of data (AWR), which contained not only data from normal V$ views containing cumulative statistics but also metric data and sampled activity data in the Active Session History. The combination of all of these data sources has taken performance analysis to a different level.

Tom Kyte’s favorite performance story is about a database that was always slow on rainy Mondays. What’s your favorite performance story from your own experiences?

One company that I worked with early on in my Oracle career asked me to help them improve the performance of a large batch report which was produced every night and went out to six people around the organization. It was causing problems for all of the rest of their batch operations by consuming a large amount of resources. The first improvement was to run the report once and print six copies rather than run the same report six times! Then I spoke to the folks who received the report and found out that three of them immediately tossed it in the trash (this was before the days of recycling), and the other three never looked beyond the first four summary pages as they now had an online system that allowed them to look at the details. We ended up changing the report to just produce the summary, and the overnight batch load on the system dropped by about 95% from the start point. It was definitely a case of it always being faster to not do something than to do it.

The most common problem that I see is that of flawed analysis: fixating on a particular statistic or event, which means that you never get to the root cause of the problem.

What are the typical issues you see when you are asked to look at a performance problem? Indexes? Statistics?

Well by the time I get called in to look at a performance problem these days there have probably already been quite a few people looking at it before, so all of the obvious things have already been tried. So, to be honest, the most common problem that I see is that of flawed analysis: fixating on a particular statistic or event, which means that you never get to the root cause of the problem and you end up trying to deal with a long list of symptoms. Much better to take a top-down approach and make sure you have the real cause before trying to fix things. If you have a really bad headache you could try and find a better aspirin or lie down in a darkened room, but you might be better to just stop banging your head against the wall. Having said that, I do still see a lot of problematic SQL, and drilling down to the root cause has become so much easier with the introduction of SQL Monitor. It is one of my top features of Oracle 11g, both for DBAs and developers, as it makes it so easy to find out exactly where in the plan the high resource usage and bad cardinality estimates are coming from, without even having to look at the details of the SQL itself. And, of course, I still see applications that have poor connection management and perform unnecessary parsing, even though we have been telling folks how to do it right for a couple of decades now.

I’ve heard a rumor that attendees of the Real World Performance events are being told that “tune” is a four-letter word. Is that some sort of insider joke? What does it mean?

I think that you have me confused with Cary Millsap! Cary differentiates between “tuning” and “optimizing.” The four-letter word that we talk about in the Real World Performance Day is “hack.” We define hacking as making changes without having diagnosed the root cause of the problem, without having scoped the problem or solution, and without being able to detail the expectation, in terms of what changes can be expected in the database performance statistics, of applying the “fix.” Most commonly these days the supporting argument for applying a hack is “well, I found a website that said if I set _go_faster in the init.ora I will run at least three times faster.” While Google can obviously be a good source of information, you have to remember that not everything that you read on the Internet is true. There really is no good alternative to doing proper performance analysis (although the availability of DB Time and ADDM make it easier) and proper testing, in your environment and with your data.

The title of software professional comes with a requirement to deliver quality product, not just hope that hardware will bail you out

In Oracle on VMware, Dr. Bert Scalzo makes a case for “solving” performance problems with hardware upgrades. What’s your opinion about this approach? [Footnote]

Ah, the “hardware is the new software” approach, as my colleague Andrew Holdsworth calls it. Software was called software because it was the part of the system that was “soft’ and could easily be changed. These days we often see customers who will do anything they can to avoid changing the application, no matter how bad it is. Hardware upgrades can only ever “ameliorate” a subset of performance problems. If the system is CPU bound, then adding more CPU cycles may make things better, but the benefits that you get will be, at best, the 2x every 18 months of Moore’s Law. But most systems with performance problems these days are not CPU bound, and even when they are, there is also a real possibility that adding more CPU will actually further reduce the performance of the system by increasing contention on shared structures. The performance benefits of fixing the software can be orders of magnitude greater and, if done well, make it so that the system is better able to scale with hardware upgrades. The cheap hardware theory primarily applies to CPU, although larger, cheaper memory can also help but often requires that the box is changed anyway. Storage system upgrades are rarely cheap. Although $/GB has been falling rapidly, $/GB/s and $/IOP/s have not, and reducing I/O performance problems will always involve increasing either one or the other of these throughput metrics. I would guess that most of the readers of your magazine would think of themselves as software professionals. To me that title comes with a requirement to deliver quality product, not just hope that hardware will bail you out.

Saying No to NoSQL

Just when I thought I’d finished learning SQL, the NoSQL guys come along and tell me that SQL databases cannot deliver the levels of performance, reliability, and scalability that I will need in the future. Say it isn’t so, Graham.

Well we hear much pontificating about the benefits of NoSQL, but so far I haven’t seen any audited industry-standard benchmark results as proof points. I have seen many claims from NoSQL evangelists that traditional RDBMSs cannot meet their requirements, only to find on further analysis that they tried a single open-source RDBMS, ran into some problems, and generalized from there. It is also interesting in the light of your previous question about using cheap hardware to try and resolve performance problems, that NoSQL solutions are developer intensive, as much of the functionality that would be provided by a SQL RDBMS has to be hand-crafted for each solution. But I’m sure over time we will see winners appear from the current plethora of NoSQL products.

What about Big Data. Can’t SQL databases handle big data then?

To me the case for Big Data comes down to two key areas: unstructured data and high-volume, low-value data such as web logs. This data could be stored in an RDBMS, but more typically what we are seeing customers doing is using Big Data techniques to extract information from these types of data sources and then storing this data in their RDBMS. This is the type of environment that Oracle’s recently announced Big Data Appliance is designed to help with.

The NoSQL salesmen insist that I need “sharding” instead of partitioning. Did they get that right?

Partitioning in the database has the huge benefit of being transparent to your application and your application developer. Using sharding requires that you move the management of the shards into your own application code. Do you want to develop your own code to perform queries across all of your shards and to do two-phase commits when you need to do a transaction that would affect multiple shards? And is such custom code development really cheap?

Professor Michael Stonebraker claimed in the 100th issue of the NoCOUG Journal that traditional SQL databases should be “sent to the home for tired software.” Has innovation really stopped at 400 Oracle Parkway? Has Larry sailed off into the sunset?

There have been many technologies that have claimed that they will replace SQL RDBMS over the last 30 years, including object databases and XML. SQL databases are still alive and well and contain the mission-critical data that is the lifeblood of businesses. Having a standard language, SQL, and a sound basis on relational theory means that SQL databases have stood the test of time in an industry where hype and fashion are rampant. In terms of 400 Oracle Parkway (where most of database development is housed) there are still many new features being built into the Oracle database that will increase the benefit that customers get from using the product. But you will have to wait for the product announcements to hear about those. And, of course, as the next America’s Cup is in San Francisco. Larry is still very much around and involved.

The Whole Truth About Exadata

Is Exadata a case of solving performance problems with hardware upgrades? Put another way: is the performance improvement from Exadata exactly what one might expect from the bigger sticker price, no more and no less?

Well the stock answer is that it is an engineered system that is designed to be capable of very high throughput. The software allows us to utilize the hardware much more effectively. There are customers who have upgraded to Exadata and seen the hardware upgrade benefits, typically 5–10x performance improvement, which is enough to get them into ads in The Economist and airports around the world. But the customers who have fully exploited the capabilities of Exadata have seen orders of magnitude more benefit. In our Day of Real World Performance presentations we load, validate, transform, collect optimizer statistics, and run queries on 1TB of raw data in less than 20 minutes. That sort of performance can transform what IT can deliver to the business and has far greater value than the sticker price.

Is Exadata as good for OLTP workloads as it is for OLAP? (You can be frank with me because what’s said in these pages stays on these pages!)

Well Exadata is certainly a very capable OLTP box. It has fast CPUs and can perform huge numbers of very fast I/Os with large numbers of IOPS by utilizing the flash cache in the storage cells. And OLTP performance is all about CPU horsepower and large numbers of IOPS. But I think it is fair to say that there is less “secret sauce” in Exadata as an OLTP platform than there is for data warehousing.

Show Me the Way

Thank you for answering my cheeky questions today. Someday, I hope to know as much about Oracle Database performance as you. Can you show me the way? Your book, perhaps?

Well I think that the key to being a good performance analyst is making sure that you spend time upfront correctly scoping the problem and then avoid jumping to conclusions while doing a top-down analysis of the data. When you are looking for solutions, make sure that the solution that you are implementing matches the scope of the problem that you started with, as opposed to a mismatched scope. The classic example of scope mismatch is making a database-level change, like changing an init.ora parameter, to solve a problem that is scoped to a single SQL statement. Much better to use techniques like SQL Profiles or SQL Baselines that will only affect the single SQL. Using that approach will get you a long way. As far as my book, I guess I still need to write it; it will be a cheeky book!

Footnote: Here’s the full quote from Dr. Scalzo’s book: “Person hours cost so much more now than computer hardware even with inexpensive offshore outsourcing. It is now considered a sound business decision these days to throw cheap hardware at problems. It is at least, if not more, cost effective than having the staff [sic] tuned and optimized for the same net effect. Besides, a failed tuning and optimization effort leaves you exactly where you started. At least the hardware upgrade approach results in a faster/better server experiencing the same problem that may still have future value to the business once the fundamental problem is eventually corrected. And, if nothing else, the hardware can be depreciated, whereas the time spent tuning is always just a cost taken off the bottom line. So, with such cheap hardware, it might be a wiser business bet to throw hardware at some solutions sooner than was done in the past. One might go so far as to make an economic principle claim that the opportunity cost of tuning is foregoing cheap upgrades that might fix the issue and also possess intrinsic value. Stated this way, it is a safe bet that is where the business people would vote to spend.”

Download the 102nd issue of the NoCOUG Journal

Categories: DBA, Interviews, NoCOUG, NoSQL, SQL

A Whole New World of MySQL—with Baron Schwartz of Percona

February 16, 2012 6 comments

As published in the 101th issue of the NoCOUG Journal (February 2012)

Whole New World

with Baron Schwartz

“A whole new world

A new fantastic point of view

No one to tell us no

Or where to go

Or say we’re only dreaming.”

—Oscar-winning song from the movie Aladdin

Baron Schwartz is the chief performance architect at Percona. He is the lead author of High Performance MySQL and creator of Maatkit and other open-source tools for MySQL database administration. He is an Oracle ACE and has spoken internationally on performance and scalability analysis and modeling for MySQL and PostgreSQL, and on system performance in general.

When did you first hear of MySQL, and why did you choose to get involved? Which database technologies did you use before joining the MySQL camp? What have been your contributions to MySQL since then? What are your current projects?

I became acquainted with MySQL in 1999, when I was getting my undergraduate degree at the University of Virginia. I didn’t know a lot about relational database technology at the time, and my experience was limited to a single very academic class taught by a graduate student whom I could barely understand. I finished that course with an understanding of sigmas and other funny words, but with absolutely no concept of how to really use a database server. The first time I used MySQL in a serious way is when I joined an outdoors club at the university. It was painfully obvious to me that clipboards and pieces of paper were never going to be able to meet the demand, and in fact it was all we could do to organize the events with about 30 people attending. I realized that if I built an online application for organizing the club, we could scale to several hundred members without much trouble. The club is still going strong about a decade later.
After graduating from university, I joined a company that used Microsoft SQL Server exclusively. There, I was fortunate to work with a very talented database administrator, who taught me how database servers work in the real world. I stayed there for three years, and when he left to join another company, I followed him. That company used MySQL, and the day that I walked in the door it was clear that the growing pains were severe. Together with several other people, we got past those hurdles, and the company is still running on MySQL today. Along the way, I began blogging, traveling to attend conferences, and meeting a lot of other people in the MySQL ecosystem. Something about relational database technology fascinates me—and I’m still not quite sure what that is, but I know I love working with open source and I love working with databases. The combination of those two loves has made my career very satisfying and rewarding.

To be clear, I think Microsoft SQL Server is also a fantastic product. In fact, it is superior in many ways to MySQL. I miss the instrumentation and the great tools. Many of the things that I have done in my career have been targeted at solving some of those shortcomings in MySQL.

This really began in 2006, when I started writing what eventually became Maatkit. That suite of administrative tools was pretty universally acknowledged as essential for MySQL users, and although I discontinued that project last year, all of the code lives on in Percona Toolkit, along with another open-source project that I started. I am a command-line geek first and foremost, but today I recognize that people also need graphical tools. That’s why my newest project is a suite of web-based tools, which you can find at http://tools.Percona.com.

Have Sun and Oracle been good stewards of MySQL? Has the pace of development slowed or sped up under their stewardship? Can we expect Oracle to contribute its technology gems such as [pick any feature not currently available in MySQL such as parallel query and hash joins] to MySQL?

I think that Sun and Oracle have been fantastic for MySQL. Oracle, in particular, has really figured out how to get the product out the door. The MySQL 5.5 release is the best in MySQL’s history, in my opinion. There’s a little less visibility into how development is progressing than there used to be, but they release milestone previews of the upcoming version at intervals, and what I have seen so far for the next version of MySQL is very promising.

I’m not sure how far Oracle is going to take MySQL. There was a lot of speculation that Oracle would simply kill MySQL or that they would let it stagnate. History has not borne out those fears, and I never thought they were well founded in the first place. It is such an emotional issue for some people that I tend not to participate in those conversations, and instead I simply thank Oracle for the good work that they’re doing.

There was a lot of speculation that Oracle would simply kill MySQL or that they would let it stagnate. History has not borne out those fears.

What are the best use cases for MySQL in the enterprise?

It’s risky to paint enterprise applications with a broad brush, but in general I would say that large applications can be designed successfully in several different ways. Sometimes the application itself is complex, but the database is not expected to do a lot of heavy lifting. For example, many Java applications do a lot of computation outside the database. On the other hand, many applications do centralize the logic into the database, especially when there is a plurality of applications accessing the same data store. These two approaches really represent APIs to the data; in one case the API is built in application code, but in the other case it’s built in stored procedures. When I used Microsoft SQL Server, we took the latter approach, and it’s anyone’s guess how many millions of lines of stored procedures we had in our databases. That is not an approach that I would advocate using for MySQL. Instead, the access to the database should go through an API that is designed and implemented separately from the database. Web services are the obvious example.

Why are there not TPC benchmarks for MySQL? Is MySQL unsuitable at extreme scales?

In fact, there are some TPC benchmarks for MySQL. We routinely use TPC benchmarks, or benchmarks that are similar to TPC, to measure and stress test the server. However, many of the TPC benchmarks are really focused on use cases outside of MySQL’s sweet spot. For example, TPC-H is not a workload that you would ever run successfully on MySQL—at least, not on current versions of MySQL. MySQL is really more of an OLTP system. There are some third-party solutions, such as Infobright, that can work very well for analytic workloads. But those are not vanilla MySQL—they are a customized version of the server.

MySQL is suitable for use at extreme scale, but to achieve that, you have to choose a suitable application and database architecture. When MySQL is used on very large-scale applications, it simply must be sharded and scaled out across many machines. I don’t think that this is an ideal architecture in general, and many small-, medium-, or even medium-to-large-scale applications can get away without doing this. But at extreme scale, there’s no other viable architecture, and this is not a problem that’s unique to MySQL.

When MySQL is used on very large-scale applications, it simply must be sharded and scaled out across many machines.

The MySQL forks like MariaDB, Drizzle, and Percona Server confuse me. Why does Oracle encourage them? Are they a good thing?

You’re not alone in your confusion. I do believe that these forks are a good thing, because they serve people’s needs. However, I wouldn’t say that Oracle encourages them. Oracle is playing an open-source game with an open-source database server, and those are the rules of the game. If you don’t satisfy users, they might take the code and do what they want with it. The three major forks of MySQL represent three different sets of user and developer needs.

I would say that Drizzle and MariaDB are more focused on what the developers want to achieve with the product, and you might say that they even represent an agenda. Drizzle, for example, represents a desire to rip all of the old messy code out of the server and build a clean, elegant server from scratch. MariaDB, on the other hand, represents the original MySQL developer’s vision of a database server that is very community oriented, and the desire to improve the server by leaps and bounds.

Percona Server is a little bit different. We try to stay as close to Oracle’s version of MySQL as possible, and we’re focused on solving users’ needs not scratching our own itches. You can consider Percona Server to be a fork that we modify only as needed to solve our customers’ critical problems, which are not yet addressed in current versions of the official MySQL releases. As a bonus, we distribute this to the public, not just our paying customers. Much of our customer base is extremely conservative and risk averse. As a result, we focus on small, high-value, low-risk improvements to the server. Many of the features and improvements we’ve implemented are reimplemented by Oracle as time passes, which is a nice validation of their usefulness.

In terms of real-world deployments, the official MySQL from Oracle is by far the most popular. This should not be surprising. I am biased, but from what I see, Percona Server has the overwhelming majority of the fork “market share,” if you want to call it that.

You can consider Percona Server to be a fork that we modify only as needed to solve our customers’ critical problems, which are not yet addressed in current versions of the official MySQL releases.

When Sun acquired MySQL, a PostgreSQL developer, Greg Sabino Mullane, wrote: “MySQL is an open-source PRODUCT. Postgres is an open-source PROJECT. Only two letters of difference between ‘product’ and ‘project,’ but a very important distinction. MySQL is run by a for-profit corporation (MySQL AB), which owns the code, employs practically all the developers, dictates what direction the software goes in, and has the right to change (indeed, has changed) the licensing terms for use of the software (and documentation). By contrast, Postgres is not owned by any one company, is controlled by the community, has no licensing issues, and has its main developers spread across a wide spectrum of companies, both public and private.” What licensing issues is Mullane referring to?

Open-source licensing is either a fascinating or annoying topic, depending on your point of view. PostgreSQL uses a very permissive license, which essentially boils down to “do what you want, but don’t sue us.” That serves the project very well. MySQL, on the other hand, uses the GPL, which is not only a pretty restrictive license but also a manifesto for a philosophical point of view. The reasons why many for-profit corporations choose to use the GPL often boil down to “we don’t want anyone else to own our intellectual property, because then our investors have nothing to sell.” As a result, companies that want to play in the open-source space, but still make a lot of money someday, often dual license the product, and that is what MySQL did during the days when they were venture-capital funded. Those days are over, and now it’s time for the product itself to generate a steady stream of money for its owner, but the license remains. From a revenue-generation point of view, I think it makes perfect sense. My only wish is that the GPL were purely a license, without the preaching.

PostgreSQL uses a very permissive license, which essentially boils down to “do what you want, but don’t sue us.”

The just-released PostgreSQL 9.1 has an impressive list of features such as serializable snapshot isolation. Is PostgreSQL leaving MySQL behind in the technology race?

PostgreSQL has always had a richer set of features than MySQL, but many myths about the real differences between the products persist. And frankly, there are zealots on both sides. PostgreSQL is an amazing project and an amazing database server as well. But that doesn’t mean that it’s perfect. Recent releases have added many of the missing “killer features” that have made MySQL so popular for many years. One of the things that I think is going to make a huge difference in the upcoming release is the addition of index-only scans.

PostgreSQL has always had a richer set of features than MySQL.

What do you think of the NoSQL movement? It is a threat to MySQL?

I think we have all learned a lot from it in the last few years, but I think that some of the strongest proponents have actually misunderstood what the real problems are. There was a lot of discussion that went something like “SQL is not scalable,” when in fact that’s not true at all. Current implementations of SQL databases indeed have some scalability limitations. That is an implementation problem, not a problem with the model. The baby got thrown out with the bathwater. I believe that some of the emerging products, such as Clustrix, are going to open a lot of people’s eyes to what is possible with relational technology. That said, many of the approaches taken by nonrelational databases are worth careful consideration, too. In the end, though, I don’t think that we need 50 or 100 alternative ways to access a database. We might need more than one, but I would like some of the non-SQL databases to standardize a bit.

Current implementations of SQL databases indeed have some scalability limitations. That is an implementation problem, not a problem with the model.

I’m an Oracle professional who needs to learn MySQL fast. Where should I start? What books should I read? Where can I get help?

I would start by making friends. See if there are any user groups or meet-ups near you. Who you know is much more important than what you know. In particular, the annual MySQL users conference has always been the watering hole where the MySQL community, partners, and commercial ecosystem have come together. O’Reilly used to run the conference, but this year Percona is doing it instead. You can find out more at http://www.percona.com/live/.

In terms of books, I would recommend two books to an experienced Oracle professional. The first is simply titled MySQL, by Paul Dubois. This is probably the most comprehensive and accessible overall reference to MySQL. The second is High Performance MySQL, which I co-authored with Peter Zaitsev and Vadim Tkachenko, Percona’s founders. There’s no other book like this for learning how to really make MySQL work well for you. I’m finishing up the third edition right now. Finally, I would strongly recommend the MySQL manual and documentation. The documentation team at MySQL does an amazing job.

In addition to Oracle, several other service providers are good resources for consulting, support contracts, development, remote database administration, and the like. Percona is one, of course. There are also Pythian, SkySQL, and a variety of smaller providers.

NoCOUG membership levels have been stagnant for years. Are we a dinosaur in a connected world where all information is at available our fingertips? What does a user group like NoCOUG need to do in order to stay relevant?

I think that the unique value that a user group such as yours can offer is to bring people together face-to-face. My company is entirely distributed, with more than 60 people working from their homes worldwide. This has its benefits, but there is no substitute for meeting each other and working together. Direct personal interactions still matter, and technology cannot replace them. As an example, although I’m delighted that you interviewed me for this magazine, I would be even more pleased to speak to all of you in person someday.

I think that the unique value that a user group such as [NoCOUG] can offer is to bring people together face-to-face.

Download the 101th issue of the NoCOUG Journal

Professor Stonebraker’s strong opinions on SQL, NoSQL, NewSQL, and Oracle

December 23, 2011 Leave a comment

As published in the 100th issue of the NoCOUG Journal (November 2011)

Michael Stonebraker has been a pioneer of database research and technology for more than a quarter of a century. He was the main architect of the INGRES relational DBMS, the object-relational DBMS, POSTGRES, and the federated data system, Mariposa. All three prototypes were developed at the University of California at Berkeley, where Stonebraker was a professor of computer science for 25 years. Stonebraker moved to MIT in 2001 where he focused on database scalability and opposed the old idea that one size fits all. He was instrumental in building Aurora (an early stream-processing engine), C-Store (one of the first column stores), H-Store (a shared-nothing row-store for OLTP), and SciDB (a DBMS for scientists). He epitomizes the philosophy of the American philosopher Emerson, who said: “A foolish consistency is the hobgoblin of little minds, adored by little statesmen and philosophers and divines…speak what you think today in words as hard as cannon-balls, and tomorrow speak what tomorrow thinks in hard words again, though it contradict every thing you said to-day.” (http://books.google.com/books?id=RI09AAAAcAAJ&pg=PA30)

INGRES AND POSTGRES—THE BACKSTORY

The Ingres RDBMS was the first open-source software product, wasn’t it? There’s wasn’t a GNU Public License at the time, so it was used to create commercial products. Why was Ingres distributed so freely? Which commercial database management systems owe their beginnings to the Ingres project?

Essentially all of the early RDBMS implementations borrowed from either Ingres or System R.  Berkeley/CS has a tradition of open-source projects (Unix 4BSD, Ingres, Postgres, etc.)

The embedded query language used by the Ingres RDBMS was QUEL, not SQL. Like SQL, QUEL was based on relational calculus, but—unlike SQL—QUEL failed to win acceptance in the marketplace. Why did QUEL fail to win acceptance in the marketplace?

QUEL is an obviously better language than SQL. See a long paper by Chris Date in 1985 for all of the reasons why. The only reason SQL won in the marketplace was because IBM released DB2 in 1984 without changing the System R query language. At the time, it had sufficient “throw-weight” to ensure that SQL won. If IBM hadn’t released DB2, Ingres Corp. and Oracle Corp. would have traded futures.

Postgres and PostgreSQL succeeded Ingres. Why was a replacement for Ingres necessary?

RDBMSs (at the time) were good at business data processing but not at geographic data, medical data, etc. Postgres was designed to extend database technology into other areas. All of the RDBMS vendors have implemented the Postgres extensibility ideas.

PostgreSQL continues to innovate. The recent 9.1 release offers synchronous replication, K-Nearest Neighbor indexing, and foreign data wrappers, among other goodies. Will PostgreSQL succeed where Ingres failed?

I don’t have any visibility into Postgres futures or prospects, since I am not involved.

QUEL is an obviously better language than SQL. … If IBM hadn’t released DB2, Ingres Corp. and Oracle Corp. would have traded futures.

STRUCTURED QUERY LANGUAGE

According to the inventors of SQL, Donald Chamberlin and Raymond Boyce, SQL was intended for the use of accountants, engineers, architects, and urban planners who, “while they are not computer specialists, would be willing to learn to interact with a computer in a reasonably high-level, non-procedural query language.” (http://www.joakimdalby.dk/HTM/sequel.pdf) Why didn’t things work out the way Chamberlin and Boyce predicted?

SQL is a language for programmers. That was well known by 1985. Vendors implemented other forms-based notations for non-programmers.

Chris Date quotes you as having said that “SQL is intergalactic data-speak.” (http://archive.computerhistory.org/resources/access/text/Oral_History/102658166.05.01.acc.pdf#page=43) What did you mean?

SQL is intergalactic data speak—i.e., it is the standard way for programmers to talk to databases.

Dr. Edgar Codd said in 1972: “Requesting data by its properties is far more natural than devising a particular algorithm of sequence of operations for its retrieval. Thus a calculus-oriented language provides a good target language for a more user-oriented source language.” (http://www.eecs.berkeley.edu/~christos/classics/Codd72a.pdf) With the benefit of hindsight, should we have rejected user-oriented calculus-oriented languages in favor of programmer-oriented algebra-oriented languages with full support for complex operations such as relational division, outer join, semi join, anti join, and star join?

Mere mortals cannot understand division. That doomed the relational algebra. It is interesting to note that science users seem to want algebraic query languages rather than calculus ones. Hence, SciDB supports both.

Mere mortals cannot understand [relational] division.

NO TO STRUCTURED QUERY LANGUAGE?

NoSQL is confusing to many in the relational camp. Is NoSQL a rejection of SQL or of relational database management systems, or both? Or is it just confused?

NoSQL is a collection of 50 or 75 vendors with various objectives. For some of them, the goal is to go fast by rejecting SQL and ACID. I feel these folks are misguided, since SQL is not the performance problem in current RDBMSs. In fact, there is a NewSQL movement that contains very high-performance ACID/SQL implementations.

Other members of the NoSQL movement are focused on document management or semi-structured data—application areas where RDBMSs are known not to work very well. These folks seem to be filling a market not well served by RDBMSs.

You’ve been championing NewSQL as an answer to NoSQL? What exactly is NewSQL?

Current RDBMSs are too slow to meet some of the demanding current-day applications. This causes some users to look for other alternatives. NewSQL preserves SQL and ACID, and gets much better performance with a different architecture than that used by the traditional RDBMS vendors.

Oracle Database did not enforce referential integrity constraints until Version 7. Back then, Berkeley/CS Professor Larry Rowe suggested that the best way for the CODASYL systems to compete against the relational systems was to point out that they did not [yet] support referential integrity. (http://findarticles.com/p/articles/mi_m0SMG/is_n1_v9/ai_7328281/) Can the new entrants in the DBMS marketplace prevail against the established players without enforcing integrity constraints and business rules?

I have seen several applications where the current RDBMS vendors are more than an order of magnitude too slow to meet the user’s needs. In this world, the traditional vendors are nonstarters, and users are looking for something that meets their needs.

The older players in the DBMS marketplace are encumbered by enterprise-grade capabilities that hamper performance. (http://www.think88.com/Examples/Think88_SybaseIQ_wp.pdf) Are enterprise-grade capabilities and performance mutually exclusive?

Everybody should read a great book by Clayton Christenson called The Innovator’s Dilemma. The established vendors are hampered (in my opinion) primarily by legacy code and an unwillingness to delete or change features in their products. As such, they are 30-year-old technology that is no longer good at anything. The products from the current vendors deserve to be sent to the Home for Tired Software. How to morph from obsolete products to new ones without losing one’s customer base is a challenge—which is the topic of the book above.

In this world, the traditional vendors are nonstarters, and users are looking for something that meets their [performance] needs. … The products from the current vendors deserve to be sent to the Home for Tired Software.

THE CUTTING EDGE

Why do you believe that it is time for a complete rewrite of relational database management systems?

In every market I can think of, the traditional vendors can be beaten by one to two orders of magnitude by something else. In OLTP, it is NewSQL; in data warehouses, it is column stores; in complex analytics, it is array stores; in document management, it is NoSQL. I see a world where there are (perhaps) a half-dozen differently architected DBMSs that are purpose built. In other words, I see the death of one-size-fits-all.

Your latest projects, Vertica and VoltDB, claim to leave legacy database management systems in the dust, yet neither of them have published TPC benchmarks. How relevant are TPC benchmarks today?

It is well understood that the standard benchmarks have been constructed largely by the traditional RDBMS vendors to highlight their products. Also, it is clear that they can make their products go an order of magnitude faster on standard benchmarks than is possible on similar workloads.

I encourage customers to demand benchmark numbers on their real applications.

A massively parallel, shared-nothing database management system scales very well if the data can be sharded and if each node has all the data it needs. However, if the physical database design does not meet the needs of the application, then broadcasting of data over the network will result in diminished scalability. How can this problem be prevented? (Question from Dave Abercrombie, Principal Database Architect, Convio)

Physical database design will continue to be a big challenge, for the reasons you mention. It is not clear how to get high performance from an application that does not shard well without giving something else up. This will allow application architects to earn the big bucks for the foreseeable future.

It is well understood that the standard benchmarks have been constructed largely by the traditional RDBMS vendors to highlight their products. … I encourage customers to demand benchmark numbers on their real applications.

“GO WEST, YOUNG [WOMAN], GO WEST AND GROW UP WITH THE COUNTRY”

You’ve had a ringside seat during the relational era and have spent a lot of time in the ring yourself. What would you have changed if you could go back and start all over again?

I would have made Oracle do serious quality control and not confuse future tense and present tense with regard to product features.

George Orwell imagined that by the year 1984, a device called a “telescreen” would watch you day and night and hear everything you said. (http://books.google.com/books?id=w-rb62wiFAwC&pg=PA7) Substitute “database” for “telescreen” and “government,” “advertisers,” or “criminals” for “thought police,” and Orwell’s vision is not far from today’s reality. Big Data is watching us every minute of the day. Every movement is tracked and recorded by cell towers; every thought is tracked and recorded by search engines; every financial transaction is tracked and recorded by the financial industry; and every text message, email message, and phone conversation is tracked and recorded by service providers. Are databases more evil than good?

A good example is the imminent arrival of sensors in your car, put there by your insurance carrier in exchange for lower rates. Of course, the sensor tracks your every movement, and your privacy is compromised. I expect most customers to voluntarily relinquish their privacy in exchange for lower rates.  Cell phones and credit cards are similar; we give up privacy in exchange for some sort of service. I expect that our privacy will be further compromised, off into the future.

As long as we feel this way as a society, privacy will be nonexistent.

What advice do you have for the young IT professional or computer science graduate just starting out today? Which way is west?

The Internet made text search a mainstream task. Ad placement and web mass personalization are doing likewise for machine learning. Databases are getting bigger faster than hardware is getting cheaper.  Hence, I expect DBMS technology will continue to enjoy a place in the sun. ▲

I would have made Oracle do serious quality control and not confuse future tense and present tense with regard to product features.

Download the 100th issue of the NoCOUG Journal

Categories: DBA, Interviews, NoCOUG, NoSQL, Oracle, SQL

Unconventional Wisdom: Do we probably need RAC, Exadata, Oracle Database 12c, MySQL, certification, and Oracle user groups?

September 24, 2011 6 comments

Years ago, Mogens Norgaard, the co-founder of the Oak Table network wrote a provocative paper titled “You Probably Don’t Need RAC.” Here are the opening sentences of his paper:

If you’ve been holidaying in Siberia or similar places for about a year, you have probably not talked to an Oracle Sales rep yet about RAC. But you will no doubt find that there’s a voice mail waiting for you when you turn your mobile phone on again after returning home from the vacation.

RAC is being pushed very hard by Oracle. You will get high availability, incredible scalability, a much improved personal life, the ability to partition workloads, buy cheap Linux servers and what have you.

It sounds pretty good. How can anyone say no to that kind of offer?

The closing sentences are as interesting as the opening sentences, especially the very last one.

If you have a system that needs to be up and running a few seconds after a crash, you probably need RAC.

If you cannot buy a big enough system to deliver the CPU power and or memory you crave, you probably need RAC.

If you need to cover your behind politically in your organisation, you can choose to buy clusters, Oracle, RAC and what have you, and then you can safely say: “We’ve bought the most expen­sive equipment known to man. It cannot possibly be our fault if something goes wrong or the system goes down”.

Otherwise, you probably don’t need RAC. Alternatives will usually be cheaper, easier to manage and quite sufficient.

Now please prove me wrong.

The paper was written in the early days of RAC. The technology has matured and improved since the date of the paper and, therefore, a number of the technical details in the paper are no longer valid. However, the underlying message of the paper is that you need to make an informed decision, justify the increased complexity and cost, and consider the alterna­tives.

Years ago you said that we probably don’t need RAC. Have you recanted yet? Do we probably need RAC?

I still think very, very few shops actually need RAC. Fantastic technology—just like, say, head-up display (HUD) for cars—but few really need it. RAC still has all the hallmarks of something people will want to buy: It increases complexity immensely, it’s expensive, it requires specialists that are in­creasingly hard to find, there are always excellent alterna­tives—and it’s pretty much perpetually unstable. For all those good reasons, more and more customers are using it. Either because manly types like to increase chaos, or because I’ve been telling people not to use it since around the year 2000. Whenever I recommend or don’t recommend something, most customers go out and do exactly the opposite, so in that sense I have a great deal of influence in the market.

Whenever I recommend or don’t recommend something, most customers go out and do exactly the opposite, so in that sense I have a great deal of influence in the market.

Do we probably need Exadata? Is Big Iron the ultimate answer to the great question of life, the universe, and everything?

In some ways, Exadata is the new RAC. It’s a lot about hardware, uptime, performance, amazing technology—and price. It’s also approaching the “Peak of Inflated Expectations” as seen in Gartner’s hype cycle, and it will soon set its course downwards toward the “Trough of Disillusionment.” Just like with RAC, I simply love the technology—a lot of good guys that I like and respect are on it, but few really need it. One of the things I love about it is that there isn’t any SAN involved, since I believe SANs are responsible for a lot of the instability we see in IT systems today. I tend to think about Exadata as a big mainframe that could potentially do away with hundreds of smaller servers and disk systems, which appeals hugely to me. On the other hand, the pricing and complexity makes it something akin to RAC—that’s my current thinking.

One of the things I love about [Exadata] is that there isn’t any SAN involved, since I believe SANs are responsible for a lot of the instability we see in IT systems today.

Do we probably need Oracle Database 12c (or whatever the next version of Oracle Database will be named)?

Since Oracle 7.3, that fantastic database has had pretty much everything normal customers need. It has become more and more fantastic; it has amazing features that are light years ahead of competitors—and fewer and fewer are using the database as it should be used (they’re using it as a data dump, as Tom Kyte said many years ago), so the irony is that as the database approaches a state of nirvana (stability, scalability, predictability, diagnosability, and so forth—fewer and fewer are using it as it should be used (in my view), and more and more are just dumping data into it and fetching it.[Norgaard’s First Law]

Since Oracle 7.3, that fantastic database has had pretty much everything normal customers need.

Do we probably need MySQL? Or do we get what we pay for?

As customers (and especially new, freshly faced programmers) want to use new things instead of things that work and perform, it becomes more and more logical to use MySQL or other databases instead of the best one of them all: Oracle. Since MySQL succeeded in becoming popular among students and their professors, it is immensely popular among them when they leave school (the professors stay, of course, since they don’t know enough about databases to actually get a real job working with them outside academia). So MySQL will be used a lot. And it’s an OK database, especially if we’re talking the InnoDB engine.

As customers (and especially new, freshly faced programmers) want to use new things instead of things that work and perform, it becomes more and more logical to use MySQL or other databases instead of the best one of them all: Oracle.

Do we probably need certification? Or do we learn best by making terrible mistakes on expensive production systems?

I hate certifications. They prove nothing, and they become a very bad replacement for real education, training, and knowledge. Among Windows and Cisco folks, it’s immensely popular, but you can now feed all the farm animals in Denmark (and we’ve got quite a few, especially a lot of pigs) with certi­fied Microsoft and Cisco people. It’s taken by students (what?!? instead of real education, they train them in something that concrete? I find it really stupid), among unemployed (we have a lot of programs for those folks here), and what have you. They’re worthless, and a lot of people think it will help them finding a job, thereby providing false hopes and security. YPDNC.

I hate certifications. They prove nothing, and they become a very bad replacement for real education, training, and knowledge.

Do we probably need ITIL? Should we resist those who try to control and hinder us?

When you begin doing “best practices” stuff like ITIL, you’ve lost. You’re pouring cement down the org chart in your shop, and God bless you for that—it helps the rest of us compete. “Best practices” means copying and imitating others that have shops that are unlike yours. Standardizing and automat­ing activity in brain-based shops always seemed strange to me. The results—surprise!—are totally predictable: jobs become immensely boring, response times become horrible, queues are everywhere, and nothing new can happen unless a boss high up really demands it. It’s Eastern Europe—now with computers. Oh, and it’s hype; it’s modern right now but will be replaced by the next silly thing (like LEAN—what a fantasti­cally stupid idea, too). Maybe we’ll have LEAN ITIL one day? Or Balanced Score Card–adjusted ITIL? Or Total Quality Management of LEAN ITIL?

When you begin doing “best practices” stuff like ITIL, you’ve lost. You’re pouring cement down the org chart in your shop, and God bless you for that—it helps the rest of us compete.

The funny thing is that Taylor’s ideas (called “scientific management”) were never proved, and he was actually fired from Bethlehem Steel after his idiotic idea of having a Very Big Hungarian lift 16 tons in one day (hence all the songs about 16 tons), because he cheated with the results and didn’t get anything done that worked. Not one piece of his work has ever been proved to actually work. His “opponent” was Mayo (around the 1920s), with his experiments into altering the work environment (hence the constant org changes and office redos that everybody thinks must be good for something)—and his work has never been proved either. And he cheated too, by the way, which he later had to admit. So all this man­agement stuff is bollocks, and ITIL is one of its latest fads. I say: Out with it. Let’s have our lives and dignities back, please.

NoCOUG membership and attendance has been declining for years. Do we probably need NoCOUG anymore? We’ll cele­brate our 25th anniversary in November. Should we have a big party and close up the shop? Or should we keep marching for another 25 years?

No. Oracle User Groups are dead as such. Just like user groups for mainframe operators or typesetters. You can make the downward sloping angle less steep by doing all sorts of things, but it’s the same with all Oracle user groups around the world. I think I have a “technical fix” or at least something crazy and funny that can prolong NoCOUG’s life artificially: move onto the Net aggressively and do it with video every­where. Let it be possible to leave video responses to technical questions (why doesn’t Facebook have that?); let it be possible to upload video or audio or text replies to debates and other things via a smartphone app. Let there be places where the members can drink different beers at the same time and chat about it (and show the beer on the screen), etc., etc. In other words: Abandon the real world before all the other user groups do it—and perhaps that way you can swallow the other user groups around you and gradually have World Dominance.

Oracle User Groups are dead as such. Just like user groups for mainframe operators or typesetters. You can make the downward sloping angle less steep by doing all sorts of things, but it’s the same with all Oracle user groups around the world.

It costs a fortune to produce and print the NoCOUG Journal. Do we probably need the NoCOUG Journal anymore?

I have subscribed to the world’s arguably best magazine, The Economist, since 1983. Recently they came out with an app, and now I don’t open the printed edition any more (I still receive it for some reason). It’s so much cooler to have the magazine with me everywhere I go, and I can sit in the bath­room and get half of the articles in there read. It’s the way. Magazines should not be available anymore in print. Nor should they (in my view) be available on a silly website that people have to go to using a PC, a browser, and all sorts of other old-days technology. The smartphone is the computer now. Move the magazine there aggressively, and in the process, why not create a template that other user groups could take advantage of? Or the Mother of All Usergroup Apps (MOAUA) that will allow one user group after another to plug in, so people can read all the good stuff all over the world?

Magazines should not be available anymore in print. Nor should they (in my view) be available on a silly website that people have to go to using a PC, a browser, and all sorts of other old-days technology.

I’m writing a book on physical database design techniques like indexing, clustering, partitioning, and materialization. Do we probably need YABB (Yet Another Big Book)?

No, certainly not. Drop the project immediately, unless you can use it as an excuse to get away from the family now and then. Or, if you must get all this knowledge you have out of your system, make an app that people can have on their phone and actually USE in real-life situations. Abandon books im­mediately, especially the physical ones.

Interview conducted by Iggy Fernandez for the August 2011 issue of the NoCOUG Journal.

BACK TO POST Footnote: Norgaard’s First Law is “Whenever something reaches a state of perfection, i.e., where it becomes stable and very productive (in other words, saves time and money and effort), it will be replaced by something more chaotic.” (Q2 2006 issue of the ODTUG Technical Journal).

Categories: DBA, Interviews, NoCOUG, Oracle
Follow

Get every new post delivered to your Inbox.

Join 743 other followers

%d bloggers like this: