Home > SQL > Bad Information Lives Forever on the Internet

Bad Information Lives Forever on the Internet


Bad information can live forever on the internet. A long time ago, a Wikipedia article claimed that the development of Informix was influenced by University Ingres. The Wikipedia article was copied by other websites and the claim lived on even after it was removed from the article. I quoted one of these websites and repeated the claim in my post Is Oracle Database a Legacy Technology? (Part 1).

A reader of my post questioned the claim and I was forced to research it. That’s when I discovered the old version of the Wikipedia article. Here is the relevant text:

“Unlike System R, the Ingres source code was freely available (on tape) for a modest fee. By 1980 some 1,000 copies had been distributed, and a number of companies were using the code for their own product lines. Informix was one of the earliest users, and one of the few that was formed by people completely external to the Ingres project. They released the first version of their Ingres-based product in 1984.” (http://en.wikipedia.org/w/index.php?title=Ingres_(database)&oldid=906585)

One of the sources quoted by the author was Funding a Revolution: Government Support for Computing Research (National Academies Press, 1999).

“Ingres technology diffused into the commercial sector through three major channels: code, people, and publications. Unlike the technical details of the IBM project, Ingres source code was publicly available, and about 1,000 copies were distributed around the world so that computer scientists and programmers could experiment with the system and adjust it to their own needs. Michael Stonebraker founded Ingres Corporation (purchased by Computer Associates in 1994) to commercialize the Berkeley code directly. Robert Epstein, the chief programmer at Ingres in the 1970s, went on to co-found Britton-Lee Incorporated and then Sybase. Both Britton-Lee and Sybase used ideas and experience from the original Ingres, and government agencies were early customers of both companies. Computer Associates released a commercial version of the Ingres code in the 1980s. Continued movement of Ingres researchers throughout the database community spread the technology even farther. Jerry Held and Carol Youseffi moved from UC-Berkeley to Tandem Computers Incorporated, where they built a relational system, the predecessor to NonStop SQL. Until joining Kleiner, Perkins, Caufield & Byers in 1998, Held was senior vice-president of engineering at Oracle, where he headed that company’s database efforts. Paula Hawthorn moved from Ingres to Britton-Lee (as did Michael Ubell) and eventually became a co-founder of Illustra Information Technologies Incorporated, now part of Informix. Stonebraker himself worked with Ingres Corporation, Illustra, and Informix. Other Ingres alumni went to AT&T, Hewlett-Packard Company (HP), IBM, and Oracle, bringing with them the lessons learned from Ingres.” (http://www.nap.edu/openbook.php?record_id=6323&page=165)

The claim that Ingres influenced products such as Sybase and Tandem NonStop SQL therefore hinges on the widespread availability of the Ingres code and the migration of Ingres alumni to other companies. The Wikipedia article on Ingres noted that Relational Database Systems—the original name of the company that created the Informix database management system—was not founded by Ingres alumni but nevertheless claimed that “Ingres was one of the first users” of the Ingres code. The Wikipedia article on Informix made similar claims at the time:

“Roger Sippl and Laura King formed Informix under the name Relational Database Systems Inc, to make and sell an ISAM-based database product known as Marathon, which went on the market in 1981. They then turned their attention to the relational database world, starting with the publically-available Ingres source code. At the time Ingres had a number of serious limitations, using page-level locking, relying on the underlying operating system to provide all security, and limiting names to only 18 characters. In addition Ingres used its own query language QUEL, at a time when the market was clearly moving to SQL. Nevertheless Ingres was well tested and free. Informix included only the most basic changes to the Ingres system, most notably an adaptation of QUEL to their own Informer language.” (http://en.wikipedia.org/w/index.php?title=IBM_Informix&oldid=890504)

The unsubstantiated claims about the origins of Informix were eventually removed from both Wikipedia articles but they now had a life of their own on other websites. A post in an Informix forum lent some credence to these claims by suggesting that the Informer language used in early versions of Informix was based on QUEL, the SQL alternative used by University Ingres.

“As far as ingres code goes, I was surprised when I read that too, but then I got to thinking. The INFORMER query language that was part of Informix 3.3 (and Informix was the product name back then – the company was RDS – Relational Database Systems.) and it was definitely QUEL-based, QUEL being the language used by Ingres back then. Informix SQL, or ISQL, made the switch to SQL along with moving to a client/server model (in Informix 3.3, the application accessed the data files directly.) So it could very well be that the INFORMER piece was built using open source ingres code. It would have been the easy way to go.” (http://www.dbmonster.com/Uwe/Forum.aspx/informix/2138/Informix#5kfa805r3h0oiceea2nmkimk92hfh0rd984axcom)

But then I found an interview with Roger Sippl, the founder of Relational Database Systems, in which he recounts how he built a report-writer for a microcomputer manufacturer named Cromemco.

“So I had spent two years building a report writer for a primitive database system that Cromemco had and designing a complete multiuser relational database with a forms package, a report writer and a C programmatic interface that I called CRIS – the Cromemco Relational Information System. […] I had written three documents totaling probably 200 pages of designs for this Cromemco Relational Information System that was going to have an even more sophisticated report writer that dealt with even a more sophisticated database and the database was going to be a multitable, multiuser relational database using B-trees for indexing and record locking for ________ control. It was going to be a true minicomputer/mainframe developer database management system. With a forms package, query language report writer and C programmatic interface.” (http://archive.computerhistory.org/resources/access/text/Oral_History/102658001.05.01.acc.pdf)

To implement his new designs, Sippl then founded Relational Database Systems using a loan from an ex-girlfriend.

“So we built the report writer and the access method that was going to be our B-tree access method. We sold that as a separate product called C-ISAM so that was the first indexed file system for C programmers for UNIX and that was the underpinnings of our relational database management system. We put a database data dictionary on top of those ISAM files so that the report writer could allow you to use field names and table names. And then we put the forms package on top of that. And then the query language that was used in the report writer and the ad hoc query language informer was a truly relational query language and report writer. It was based more on the relational algebra than the relational calculus. SQL is more based on relational calculus. But the calculus is harder to implement. It takes a lot more memory to run. Back then, in ’80 to ’82, people were still debating whether a relational database could be run on anything smaller than the world’s largest mainframe. There were stories that IBM in Santa Theresa was trying to ship a product called DB2 but it was bringing the whole machine to its knees whenever it tried to run queries, so they couldn’t ship it. So the relational model was getting a bad rap as being a pipe dream, something that was infinitely flexible but infinitely resource intensive and would never be of use to anybody but the CIA. I made the pitch that if you based it on the simpler relational model – the algebraic model – it didn’t need quite so many resources. It wasn’t quite as automatic and not procedural but it was really powerful anyway. And I convinced enough people of that that we sold lots of C-ISAMs and lots of report writers and when we had the full Informix database management system we sold lots of that.” (http://archive.computerhistory.org/resources/access/text/Oral_History/102658001.05.01.acc.pdf)

This was a very coherent story but did it reach the level of proof? I searched some more and found a copy of an Informer Query Language manual. Here is the text from the cover page.

INFORMIX
Relational Database Management System
USER’S MANUAL

INFORMER Query Language

Copyright (c) 1981, 1982, 1983, 1984
Relational Database Systems, Inc.
July 1, 1984

INFORMIX Version 3.20

True enough; Informer used relational algebra operations such as UNION, INTERSECT, and MINUS, not relational calculus. Here is the description of the ASSIGN command from page 29 of the manual:

III.I. The ASSIGN Command

The ASSIGN command and its UNION, MINUS and INTERSECT clauses allow the user to perform basic set operations. You can create a union, intersection or difference of two temporary files. First create the temporary files with READ, and then use one of the following:

> ASSIGN c = a UNION b

or

> ASSIGN d = a INTERSECT b

or

> ASSIGN e = a MINUS b ;

Here is the Informer procedure that you would have had to use to find out which customer had never ordered a part. Note that column names were required to be unique so it was never necessary to qualify a column name with the table name.

read into c cname;
read into o oname;
assign t = c minus o;
print t;

Here is what you would need to do if using QUEL:

retrieve (customers.cname)
where any (orders.oname by customers.cname where orders.oname = customers.cname) = 0

And here’s the corresponding SQL formulation:

select customers.cname
from customers
where not exists (select * from orders where orders.oname = customers.cname);

Note the similarity of the QUEL and SQL formulations. That’s because both QUEL and SQL were based on relational calculus, not relational algebra. The set operators INTERSECT and EXCEPT (MINUS) were only added to SQL in 1992. Here’s a SQL formulation that uses EXCEPT:

select cname
from customers
except
select oname
from orders;

I think this is conclusive proof that the development of Informix was not influenced by University Ingres!

P.S. For extra credit, what is the name of the commercial version of the Ingres code created by Computer Associates in the 1980s? Note that Computer Associates only acquired the ASK Group in 1994.

Categories: SQL
  1. June 9, 2010 at 6:48 pm

    Excellent piece of research, Iggy. Thanks heaps.

    I do recall from a faint memory of my early days at Oracle that Informix turned to SQL-only at more or less the same time Oracle release 5 got established in the marketplace and we started to hear rumours of V6: around 1987-88.

    It was by then very clear that SQL was the language that would win the fight for RDBMS and Informix did the right thing in changing to it.

    Ingres might have survived a lot better had it embraced SQL early on – and developed a truly recoverable data management engine – instead of listening to that insufferable Stonebraker.

    Before that language switch, Informix used some proprietary language of some sort. But I never heard of a link between such language and QUEL.

  1. No trackbacks yet.

Leave a comment