As suggested by the following story, Google would have preferred to hire my teenage daughter as the manager of their database team instead of me. I was on a long drive with my family so—to pass the time—I asked them to solve the problem that the Google interviewer had asked me to solve:
“Four men are on one side of a rickety bridge on a dark night. The bridge is only strong enough to support two men at a time. It is also necessary for the men crossing the bridge to carry a lantern to guide their way, and the four men have only one lantern between them. Andy can cross the bridge in 1 minute, Ben in 2, Charlie in 5, and Dan in ten minutes. How quickly can all four men be together at the other side?”
My daughter’s first solution was identical to mine.
Andy and Ben cross the bridge first. This takes two minutes.
Andy returns with the lantern. This takes one minute.
Andy and Charlie cross the bridge next. This takes five minutes.
Andy returns with the lantern. This takes one minute.
Andy and Dan cross the bridge last. This takes ten minutes.
The total time for the above solution is 19 minutes. However, I had googled the answer after returning from my interview (at Google) and knew that the four men could cross in 17 minutes, so I asked my daughter to try again. She “solved” the problem on her second attempt which suggests that Google would have preferred to hire her as a manager of database administration instead of me. Click here to see the “solution.”
I quoted the words “solved” and “solution” in the above paragraph because we still need a rigorous proof that the above “solution” is in fact the optimal solution; that is, is there a solution that takes less than 17 minutes? Neither does the above “solution” provide any insight into the general case. For example, the above “solution” is not optimal if Ben takes 4 minutes to cross the bridge instead of 2 minutes. The above “solution” needs 23 minutes if Ben takes 4 minutes to cross the bridge but it can be done in 21 minutes. And what if there are more than four people who need to cross? I am willing to bet that my Google interviewer would not have been able to prove the optimality of the above “solution” or solve the general case. If you’re interested, a comprehensive mathematical treatment of the above case as well as the general case can be found in this mathematical paper by Prof. Gunter Rote of the Free University of Berlin.
The Google interview technique is not the best technique for finding the best database administrators (or those with the right aptitude). Please feel free to comment. Is it just a case of sour grapes on my part?
Yesterday, a colleague asked what she could do to improve her Oracle career. Having recently missed one of the best chances I’ve ever had by not knowing enough about parallel query execution plans, my answer was “never stop studying.” I’m ashamed to admit that I haven’t been practicing what I preach but I am not going to let that continue for another day longer.
When I was younger, I was a voracious reader. I would spend all my spare change in buying books or photocopying books. I need to start thinking young again. As the founder of Ford Motor Company, Henry Ford, said on page 89 of My Philosophy of Industry and Moving Forward: “Anyone who stops learning is old—whether this happens at twenty or at eighty. Anyone who keeps on learning not only remains young but becomes constantly more valuable—regardless of physical capacity.”
For a limited time, the good folk at Red Gate are giving away Tom Kyte’s latest book Expert Oracle Database Architecture: Oracle Database 9i, 10g, and 11g Programming Techniques and Solutions for free, so I’ll start there. You don’t even have to give them your email address in return. What’s better than Tom Kyte’s best book and what’s better than free? I’m sure there’s a lot in there that I never knew or had forgotten or has changed or improved since I last encountered it in an earlier version of Oracle. Perhaps I’ll be ready the next time Lady Luck comes knocking :-)
I pledge to read Expert Oracle Database Architecture: Oracle Database 9i, 10g, and 11g Programming Techniques and Solutions by Tom Kyte in the next four to eight weeks.
Do the best thing you can do for your Oracle career. Pick a good Oracle book and make a pledge to read it in a time frame that works for you.
As published in the 101th issue of the NoCOUG Journal (February 2012)
Whole New World
with Baron Schwartz
“A whole new world
A new fantastic point of view
No one to tell us no
Or where to go
Or say we’re only dreaming.”
—Oscar-winning song from the movie Aladdin
Baron Schwartz is the chief performance architect at Percona. He is the lead author of High Performance MySQL and creator of Maatkit and other open-source tools for MySQL database administration. He is an Oracle ACE and has spoken internationally on performance and scalability analysis and modeling for MySQL and PostgreSQL, and on system performance in general.
When did you first hear of MySQL, and why did you choose to get involved? Which database technologies did you use before joining the MySQL camp? What have been your contributions to MySQL since then? What are your current projects?
I became acquainted with MySQL in 1999, when I was getting my undergraduate degree at the University of Virginia. I didn’t know a lot about relational database technology at the time, and my experience was limited to a single very academic class taught by a graduate student whom I could barely understand. I finished that course with an understanding of sigmas and other funny words, but with absolutely no concept of how to really use a database server. The first time I used MySQL in a serious way is when I joined an outdoors club at the university. It was painfully obvious to me that clipboards and pieces of paper were never going to be able to meet the demand, and in fact it was all we could do to organize the events with about 30 people attending. I realized that if I built an online application for organizing the club, we could scale to several hundred members without much trouble. The club is still going strong about a decade later.
After graduating from university, I joined a company that used Microsoft SQL Server exclusively. There, I was fortunate to work with a very talented database administrator, who taught me how database servers work in the real world. I stayed there for three years, and when he left to join another company, I followed him. That company used MySQL, and the day that I walked in the door it was clear that the growing pains were severe. Together with several other people, we got past those hurdles, and the company is still running on MySQL today. Along the way, I began blogging, traveling to attend conferences, and meeting a lot of other people in the MySQL ecosystem. Something about relational database technology fascinates me—and I’m still not quite sure what that is, but I know I love working with open source and I love working with databases. The combination of those two loves has made my career very satisfying and rewarding.
To be clear, I think Microsoft SQL Server is also a fantastic product. In fact, it is superior in many ways to MySQL. I miss the instrumentation and the great tools. Many of the things that I have done in my career have been targeted at solving some of those shortcomings in MySQL.
This really began in 2006, when I started writing what eventually became Maatkit. That suite of administrative tools was pretty universally acknowledged as essential for MySQL users, and although I discontinued that project last year, all of the code lives on in Percona Toolkit, along with another open-source project that I started. I am a command-line geek first and foremost, but today I recognize that people also need graphical tools. That’s why my newest project is a suite of web-based tools, which you can find at http://tools.Percona.com.
Have Sun and Oracle been good stewards of MySQL? Has the pace of development slowed or sped up under their stewardship? Can we expect Oracle to contribute its technology gems such as [pick any feature not currently available in MySQL such as parallel query and hash joins] to MySQL?
I think that Sun and Oracle have been fantastic for MySQL. Oracle, in particular, has really figured out how to get the product out the door. The MySQL 5.5 release is the best in MySQL’s history, in my opinion. There’s a little less visibility into how development is progressing than there used to be, but they release milestone previews of the upcoming version at intervals, and what I have seen so far for the next version of MySQL is very promising.
I’m not sure how far Oracle is going to take MySQL. There was a lot of speculation that Oracle would simply kill MySQL or that they would let it stagnate. History has not borne out those fears, and I never thought they were well founded in the first place. It is such an emotional issue for some people that I tend not to participate in those conversations, and instead I simply thank Oracle for the good work that they’re doing.
There was a lot of speculation that Oracle would simply kill MySQL or that they would let it stagnate. History has not borne out those fears.
What are the best use cases for MySQL in the enterprise?
It’s risky to paint enterprise applications with a broad brush, but in general I would say that large applications can be designed successfully in several different ways. Sometimes the application itself is complex, but the database is not expected to do a lot of heavy lifting. For example, many Java applications do a lot of computation outside the database. On the other hand, many applications do centralize the logic into the database, especially when there is a plurality of applications accessing the same data store. These two approaches really represent APIs to the data; in one case the API is built in application code, but in the other case it’s built in stored procedures. When I used Microsoft SQL Server, we took the latter approach, and it’s anyone’s guess how many millions of lines of stored procedures we had in our databases. That is not an approach that I would advocate using for MySQL. Instead, the access to the database should go through an API that is designed and implemented separately from the database. Web services are the obvious example.
Why are there not TPC benchmarks for MySQL? Is MySQL unsuitable at extreme scales?
In fact, there are some TPC benchmarks for MySQL. We routinely use TPC benchmarks, or benchmarks that are similar to TPC, to measure and stress test the server. However, many of the TPC benchmarks are really focused on use cases outside of MySQL’s sweet spot. For example, TPC-H is not a workload that you would ever run successfully on MySQL—at least, not on current versions of MySQL. MySQL is really more of an OLTP system. There are some third-party solutions, such as Infobright, that can work very well for analytic workloads. But those are not vanilla MySQL—they are a customized version of the server.
MySQL is suitable for use at extreme scale, but to achieve that, you have to choose a suitable application and database architecture. When MySQL is used on very large-scale applications, it simply must be sharded and scaled out across many machines. I don’t think that this is an ideal architecture in general, and many small-, medium-, or even medium-to-large-scale applications can get away without doing this. But at extreme scale, there’s no other viable architecture, and this is not a problem that’s unique to MySQL.
When MySQL is used on very large-scale applications, it simply must be sharded and scaled out across many machines.
The MySQL forks like MariaDB, Drizzle, and Percona Server confuse me. Why does Oracle encourage them? Are they a good thing?
You’re not alone in your confusion. I do believe that these forks are a good thing, because they serve people’s needs. However, I wouldn’t say that Oracle encourages them. Oracle is playing an open-source game with an open-source database server, and those are the rules of the game. If you don’t satisfy users, they might take the code and do what they want with it. The three major forks of MySQL represent three different sets of user and developer needs.
I would say that Drizzle and MariaDB are more focused on what the developers want to achieve with the product, and you might say that they even represent an agenda. Drizzle, for example, represents a desire to rip all of the old messy code out of the server and build a clean, elegant server from scratch. MariaDB, on the other hand, represents the original MySQL developer’s vision of a database server that is very community oriented, and the desire to improve the server by leaps and bounds.
Percona Server is a little bit different. We try to stay as close to Oracle’s version of MySQL as possible, and we’re focused on solving users’ needs not scratching our own itches. You can consider Percona Server to be a fork that we modify only as needed to solve our customers’ critical problems, which are not yet addressed in current versions of the official MySQL releases. As a bonus, we distribute this to the public, not just our paying customers. Much of our customer base is extremely conservative and risk averse. As a result, we focus on small, high-value, low-risk improvements to the server. Many of the features and improvements we’ve implemented are reimplemented by Oracle as time passes, which is a nice validation of their usefulness.
In terms of real-world deployments, the official MySQL from Oracle is by far the most popular. This should not be surprising. I am biased, but from what I see, Percona Server has the overwhelming majority of the fork “market share,” if you want to call it that.
You can consider Percona Server to be a fork that we modify only as needed to solve our customers’ critical problems, which are not yet addressed in current versions of the official MySQL releases.
When Sun acquired MySQL, a PostgreSQL developer, Greg Sabino Mullane, wrote: “MySQL is an open-source PRODUCT. Postgres is an open-source PROJECT. Only two letters of difference between ‘product’ and ‘project,’ but a very important distinction. MySQL is run by a for-profit corporation (MySQL AB), which owns the code, employs practically all the developers, dictates what direction the software goes in, and has the right to change (indeed, has changed) the licensing terms for use of the software (and documentation). By contrast, Postgres is not owned by any one company, is controlled by the community, has no licensing issues, and has its main developers spread across a wide spectrum of companies, both public and private.” What licensing issues is Mullane referring to?
Open-source licensing is either a fascinating or annoying topic, depending on your point of view. PostgreSQL uses a very permissive license, which essentially boils down to “do what you want, but don’t sue us.” That serves the project very well. MySQL, on the other hand, uses the GPL, which is not only a pretty restrictive license but also a manifesto for a philosophical point of view. The reasons why many for-profit corporations choose to use the GPL often boil down to “we don’t want anyone else to own our intellectual property, because then our investors have nothing to sell.” As a result, companies that want to play in the open-source space, but still make a lot of money someday, often dual license the product, and that is what MySQL did during the days when they were venture-capital funded. Those days are over, and now it’s time for the product itself to generate a steady stream of money for its owner, but the license remains. From a revenue-generation point of view, I think it makes perfect sense. My only wish is that the GPL were purely a license, without the preaching.
PostgreSQL uses a very permissive license, which essentially boils down to “do what you want, but don’t sue us.”
The just-released PostgreSQL 9.1 has an impressive list of features such as serializable snapshot isolation. Is PostgreSQL leaving MySQL behind in the technology race?
PostgreSQL has always had a richer set of features than MySQL, but many myths about the real differences between the products persist. And frankly, there are zealots on both sides. PostgreSQL is an amazing project and an amazing database server as well. But that doesn’t mean that it’s perfect. Recent releases have added many of the missing “killer features” that have made MySQL so popular for many years. One of the things that I think is going to make a huge difference in the upcoming release is the addition of index-only scans.
PostgreSQL has always had a richer set of features than MySQL.
What do you think of the NoSQL movement? It is a threat to MySQL?
I think we have all learned a lot from it in the last few years, but I think that some of the strongest proponents have actually misunderstood what the real problems are. There was a lot of discussion that went something like “SQL is not scalable,” when in fact that’s not true at all. Current implementations of SQL databases indeed have some scalability limitations. That is an implementation problem, not a problem with the model. The baby got thrown out with the bathwater. I believe that some of the emerging products, such as Clustrix, are going to open a lot of people’s eyes to what is possible with relational technology. That said, many of the approaches taken by nonrelational databases are worth careful consideration, too. In the end, though, I don’t think that we need 50 or 100 alternative ways to access a database. We might need more than one, but I would like some of the non-SQL databases to standardize a bit.
Current implementations of SQL databases indeed have some scalability limitations. That is an implementation problem, not a problem with the model.
I’m an Oracle professional who needs to learn MySQL fast. Where should I start? What books should I read? Where can I get help?
I would start by making friends. See if there are any user groups or meet-ups near you. Who you know is much more important than what you know. In particular, the annual MySQL users conference has always been the watering hole where the MySQL community, partners, and commercial ecosystem have come together. O’Reilly used to run the conference, but this year Percona is doing it instead. You can find out more at http://www.percona.com/live/.
In terms of books, I would recommend two books to an experienced Oracle professional. The first is simply titled MySQL, by Paul Dubois. This is probably the most comprehensive and accessible overall reference to MySQL. The second is High Performance MySQL, which I co-authored with Peter Zaitsev and Vadim Tkachenko, Percona’s founders. There’s no other book like this for learning how to really make MySQL work well for you. I’m finishing up the third edition right now. Finally, I would strongly recommend the MySQL manual and documentation. The documentation team at MySQL does an amazing job.
In addition to Oracle, several other service providers are good resources for consulting, support contracts, development, remote database administration, and the like. Percona is one, of course. There are also Pythian, SkySQL, and a variety of smaller providers.
NoCOUG membership levels have been stagnant for years. Are we a dinosaur in a connected world where all information is at available our fingertips? What does a user group like NoCOUG need to do in order to stay relevant?
I think that the unique value that a user group such as yours can offer is to bring people together face-to-face. My company is entirely distributed, with more than 60 people working from their homes worldwide. This has its benefits, but there is no substitute for meeting each other and working together. Direct personal interactions still matter, and technology cannot replace them. As an example, although I’m delighted that you interviewed me for this magazine, I would be even more pleased to speak to all of you in person someday.
I think that the unique value that a user group such as [NoCOUG] can offer is to bring people together face-to-face.
Download the 101th issue of the NoCOUG Journal