Observations on the ODMG-93 Proposal
for an Object-Oriented Database Language

         Won Kim

       UniSQL, Inc.
 9390 Research Blvd.
 Austin, Texas  78759

1. What It Is and What It Is Not

Although it has many problems, ODMG-93 is an important and  positive 
contribution as a starting point in industry-wide efforts to define 
a standard object-oriented database language.

The ODMG-93 specification may be summarized simply as follows:

-  Despite its hasty claim, it is NOT a "standard". Rather, it 
is a work-in-progress proposal for an object-oriented database language 
and language bindings to it for C++ and Smalltalk. ODMG (Object-Oriented 
Database Management Group) is not a formal "standards" body. It 
is a committee formed by five vendors of first-generation object-oriented 
database systems (OODB). (For several years, some of these vendors 
have offered products that are not much more than persistent storage 
managers for object-oriented programming languages, but the misleading 
label "Object-Oriented Database System" has been stuck on such 
products  in the market.) And now the misleading label "standard" 
seems to be attached to the ODMG-93 work-in-progress proposals.

-  It espouses a database architecture that consists of a database 
management system that supports an object-oriented database language, 
and language binding layers on top of it for specific object-oriented 
programming languages. In particular, the specification consists of 
proposals for C++ language binding and Smalltalk language binding 
to a database language' and the database language is specified in 
proposals for a data definition language, a data manipulation language, 
and a query language. Of these proposals, the C++ language binding 
is the most mature, as these vendors have a lot of expertise in providing 
persistent storage for C++. However, in my view, the query language 
(at least as it is currently presented) is woefully deficient; and 
the data definition language and data manipulation language have problems, 
but are a reasonable basis for further work.

-  In essence, the database language proposal is an Object SQL. It 
uses the familiar SELECT FROM WHERE clauses of SQL. It includes the 
ORDER BY and GROUP BY clauses; the aggregation functions MIN, MAX, 
COUNT, SUM, and AVG; the set UNION, INTERSECTION, and DIFFERENCE queries, 
the existential quantifier predicate EXISTS, etc. It even has the 
SQL-2 facility called "derived table" to allow the SELECT clause 
of a query to contain another query. Besides these, the language includes 
facilities for defining and manipulating compound data structures 
(i.e., sets, bags, lists, arrays, and structs) that SQL does not support.

- The major problems and deficiencies in the ODMG-93 database language 
are due to the fact that the database language does not subsume the 
facilities in SQL (despite the fact that the data model on which the 
database language is based, the Core Object Model espoused by  the 
Object Management Group (OMG), fully subsumes the relational model).  
The database language is missing some important elements of SQL, including 
views,  dynamic schema changes, and access authorization.  Further, 
the current  language includes many little features that will add 
up to a "meta-data  management" nightmare, such as naming each 
individual object, maintaining  multiple overlapping subsets of all 
objects in a type, etc.

The goal of ODMG, with respect to the database language (but not the 
language bindings for C++ and Smalltalk), is largely identical to 
that of the X3H2 Database Standards Committee (i.e., the SQL-3 Committee); 
namely, the development of a database language for an object-oriented 
database as a post-relational database. SQL-3 is envisioned in essence 
as an Object SQL which extends SQL-2 with facilities for defining, 
manipulating, and querying an object database as a superset of a relational 
database. The SQL-3 committee includes all major relational database 
vendors, and some OODB vendors. 

Although members of ODMG are supposedly to implement the ODMG-93 
proposal within 18 months, at this point it is not clear how many 
of the members will actually do so, and how much of the proposal they 
will implement. The technical challenge of implementing the full ODMG 
language, especially the automatic query optimizer and query processor, 
is one that is unlikely to be met in 18 months. ODMG had five original 
members: Object Design, Inc., O2 Technology, Versant Object Technology, 
Objectivity, Inc., and Ontos. However, Versant Object Technology has 
opted for a joint development and marketing of an Object SQL product 
based on UniSQL's SQL/X object-oriented SQL. Objectivity, Inc. has 
announced a plan to deliver an object-oriented SQL product by early 
1994 which extends a pure SQL language processor they licensed from 
an SQL vendor. (Objectivity claims that they will offer both the SQL 
interface product and an ODMG-based query language product.) Ontos 
has had their own (limited) Object SQL for some time. O2 Technology 
claims that it already supports ODMG-93. (Although it is not a member 
of ODMG, UniSQL also supports most of the features found in ODMG-93, 
albeit in somewhat different syntax.)

The ODMG-93 database language specification, from an academic perspective, 
represents a much more concrete progress than the current confused 
state of SQL-3. In fact, despite some problems, its treatment of compound 
data (sets, bags, arrays, structs) -- including the definition and 
creation of a compound data, and retrieval of the elements in a compound 
data -- is a significant accomplishment. The ODMG-93 database language 
also accounts for nested data (e.g., parts explosion), methods, and 
inheritance in the language to some extent.  However, the ODMG-93 
database language has many major problems and deficiencies, and is 
premature to be considered a "standard" for "anything".

The biggest problem with the ODMG-93 database language is that it 
fails to recognize that the OMG Core Object Model on which it is based 
to is a superset of the relational model of data, and as such the 
ODMG database language should be a superset of ANSI SQL. For example, 
a type may have attributes, relationships, and methods; whereas a 
table in a relational database may have only attributes -- in other 
words, a type subsumes a table. Further, the domain of an attribute 
under the OMG Core Object Model may be a primitive type (e.g., integer, 
float, char, date, time, money) or a compound type (e.g., set, bag, 
list, array, struct, enumerated type); whereas the domain of an attribute 
of a table in a relational database may only be a primitive type -- 
in other words, the set of types supported under the Core Object Model 
subsumes the set of types supported in relational databases. The failure 
to recognize this simple fact has led to the current language which 
is similar to SQL but not compatible with SQL.

Given that ODMG does not include any relational database vendors as 
members, and that relational database vendors are working to develop 
the SQL-3 object-oriented SQL, ODMG needs to address the major technical 
problems in the database language parts of ODMG-93 and work with the 
ANSI X3H2 SQL-3 Committee to arrive at a single standard for a database 
language. The C++ and Smalltalk bindings, as specified in ODMG-93, 
should still be able to work with the database language.

2. ODMG-93 Database Language

2.1  What a Database Language Is

Before we proceed, it is important to briefly review what a database 
language is. A complete database language, such as SQL, consists of 
three sublanguages: data definition language (DDL), query and data 
manipulation language (DML), and data control language (DCL). 

The DDL is used for specifying the structure and integrity conditions 
on the database schema. In a relational database, the DDL is used 
to define tables, attributes in a table, the domain of an attribute, 
and constraints on an attribute or a table. In an OODB, the DDL should 
be used to define types, attributes in a type, the domain of an attribute, 
and constraints on an attribute or a type. The DDL for an OODB, however, 
must be richer than the DDL for a relational database. This is because 
an OODB is supposed to admit additional types of information which 
relational databases do not; namely, methods, compound data, nested 
data (e.g., parts explosion), and inheritance and IS-A relationships 
among types.

The DML is used for creating (inserting), updating, and deleting data 
that populates the database schema. The query language is used for 
nonprocedurally retrieving a subset of the database that satisfies 
user-specified search conditions. In a relational database, the DML 
is used to populate the database based on the database schema defined 
using the DDL. Once the database has been populated, the query language 
and DML are used to retrieve data from the database and to update 
and delete the contents of the database. In an OODB, the DML and query 
language must serve the same purposes. Because the DDL of an OODB 
allows additional types of information to be represented (and therefore 
stored in the database), the DML and query language must necessarily 
be more powerful than SQL in order to access and manipulate the additional 
information stored in the database.

The data control language is used for managing transactions (i.e., 
commit work, rollback work, restart), for controlling the access of 
the database by multiple users (i.e., grant or revoke access authorization), 
and for managing database resources (create index, drop index), for 
enforcing database integrity based on user-specified conditions (create 
trigger), etc.

Of the sublanguages, the query language is the most involved and difficult 
to design and implement. Efficient evaluation of a query is a major 
determinant of the performance of a database system. Unfortunately, 
this is the part of the ODMG-93 database language that is given the 
most skimpy treatment.

2.2  Technical Contributions of ODMG-93 Database Language

In my view, the most important contribution of the ODMG-93 database 
language is the specification of compound data types in the ODL, OML, 
and OQL. The ODMG-93 database language provides a broad set of facilities 
for defining and creating each type of compound data, facilities for 
accessing elements of a compound data (get first, get last, get i-th 
element), and facilities for manipulating more than one compound data 
(e.g., flatten a list, concatenate lists, compute the union of bags, 
compute the intersection of bags, etc.). SQL has no such facilities. 
I note, however, that compound data is not one of the primary object-oriented 
concepts, namely, encapsulation and inheritance. In my view, the ODMG-93 
database language proposal on compound data management addresses the 
technical challenges in making programming languages persistent and 
also one of the longstanding deficiencies in relational database systems.

A second important contribution of the ODMG-93 database language is 
the introduction of strong typing in the OQL. The user must indicate 
the type of the result of a query or the database language processor 
will infer the type of the query result. A strongly typed language 
makes it possible for invalid operations to be detected at compile 
time, rather than at run-time.

(The ODMG-93 database language proposes a simplified syntax for creating 
new objects (rather than using the SQL INSERT statement) and accessing 
named objects (rather than using the SELECT FROM WHERE query). In 
my view, however, this is not a significant contribution, since relational 
database systems also offer call-level interfaces to create and access 
records in much simpler ways than using the SQL INSERT or SELECT statements.)

2.3  Technical Problems of ODMG-93 Database Language

The ODMG-93 database language has the following three types of technical 
problems that need to be addressed.

The OQL in its current form is missing almost all of the SQL extensions 
that are required to account for the additional types of information 
that the ODL and OML allow to be defined and stored in the database. 
In particular, the OQL has virtually no mention of queries that involve 
methods or nested data; and it does not define queries that involve 
a type hierarchy. (The France-based O2 Technology, who authored the 
OQL part of the ODMG-93 "specification", claim that they already 
support all these features. However, the current OQL "specification" 
merely includes one simple example that shows the use of a method 
in a query, and one simple example that shows a path expression -- 
and no additional semantic or syntactic specifications. A specification 
for each such feature is much more involved than can be described 
in such a skimpy manner. Below, I provide a brief review of the motivation 
and issues for these features.)

- It proposes a different syntax and terminology from SQL even for 
those facilities whose semantics are identical or compatible with 
SQL. For example, it uses different keywords for the set difference 
operation (it calls it EXCEPT), GROUP BY HAVING (it calls it GROUP 
BY WITH), ORDER BY (it calls it SORT BY), etc.

- It has not adequately accounted for the semantic consequences of 
object-oriented concepts on a query language. In particular, the semantics 
of queries involving methods and nested data are woefully deficient, 
and queries involving an inheritance hierarchy of types are not even 
included.

- It is simply missing some major features found in SQL and relational 
database systems: views, access authorization, triggers, and dynamic 
schema changes. It also has no SQL-like statements for creating new 
objects, updating a set of objects based on query search conditions, 
etc.

The first two problems above arise because the OQL of ODMG-93 is an 
object-oriented SQL; and because the object-oriented data model on 
which the ODMG-93 database language is based fully subsumes the relational 
model, the OQL should simply subsume SQL. In other words, the OQL 
should simply be designed such that if the users will not use any 
object-oriented extensions to SQL, it should naturally degenerate 
into the standard SQL. The second problem above, the seemingly gratuitous 
syntax and terminology change, may be 
eliminated by simply accepting the SQL standard. In the remainder 
of this section, I will discuss the major problems and omissions in 
the ODMG-93 database language in some detail.

Deficiencies of Query and Data Manipulation Language

The OML does not support inserts, updates, and deletes that are based 
on query specifications; that is, OML offers no means of inserting, 
updating, or deleting more than one objects based on their satisfying 
arbitrary search conditions. Standard SQL supports such facilities.

An attribute that holds a value is a special case of a method; that 
is, an attribute has a method for reading the value and a method for 
updating the value. An attribute is an essential element in a search 
condition (predicate of the form "attribute_name  comparison_operator  
value_expression; e.g., Salary > 50000). It should be possible to 
allow a method anywhere in a search condition where an attribute name 
may be used (e.g., RetirementBenefits > 30000, where RetirementBenefits 
is a method). But it is very difficult to support queries that involve 
arbitrary user-supplied methods. There are such issues as whether 
methods will reside on the client or server in a client/server environment, 
and whether even unsafe methods (methods with unpredictable side effects) 
should be allowed in queries. Further, arbitrary methods are not amenable 
to automatic query optimization. These considerations impose practical 
limitations on allowing queries that involve methods. The OQL provides 
no such considerations.

The notion of a "path query" has been well-developed by database 
researchers during the past decade. A path query is a query written 
against nested data, by specifying search conditions against nested 
data. A path query contains, instead of just an attribute name, a 
sequence of attribute names (called a path expression). For example, 
a Person type may have an attribute named Hobby; the domain of Hobby 
may be an Activity type; and the Activity type may have an attribute 
named Number_of_Participants. Then it should be possible to issue 
a single query that says "find all persons whose hobby includes 
an activity that involve four or more participants." The WHERE clause 
of the query may contain a predicate
 "Person.Hobby.Number_of_Participants > 4".
The OQL specification, except for one short paragraph in which one 
simple path query is illustrated, does not define the semantics of 
a path query. A path expression contains a sequence of attributes. 
When any of the attributes in a path expression has a compound data 
as its domain, the meaning of the path expression becomes complicated. 
Further, a path expression should allow methods as well as attributes. 
A path expression also makes it difficult for automatic query optimization 
and processing. The OQL provides no such considerations.

The OQL allows a query on only a single type, but does not allow an 
inheritance-hierarchy query (i.e. a query that is targeted to an entire 
type hierarchy). An inheritance hierarchy of types implies an IS-A 
relationship; that is, an entity represented by a subtype "is 
a kind of" an entity represented by a supertype. For example, an Employee 
type "is a kind of" a Person type, where Employee is a subtype 
of Person (and conversely, Person is a supertype of Employee). Sometimes 
it makes sense to enquire about Person objects; and other times it 
is useful to be able to enquire about all types of Person (i.e., Person 
objects and Employee objects collectively). 

Deficiencies of Data Definition Language

The ODL has an important omission, namely view support. Relational 
database systems all support views as the external schema over the 
conceptual schema of an integrated database, and are used as a unit 
of access authorization. A user may define a view over a table by, 
for example, omitting certain attributes, and authorize another user 
to access the database only through the view. The utility of views 
remains the same for object-oriented databases; it is not true at 
all that somehow object-oriented databases obviate the need for views. 
In fact, without views, access authorization can only be supported 
partially. I note that the semantics of views for object-oriented 
databases are much more involved than their counterpart for relational 
databases. This is because a view is almost like a table in a relational 
database. If this semi-equivalence of a view and a table is to be 
transferred to an object-oriented database, a view should be almost 
like a type; which means that views may form an inheritance hierarchy, 
a view may be used as the domain of an attribute, and a view may have 
methods as well as attributes.

The ODL also is missing facilities for making dynamic changes to the 
database schema, beyond just adding a type as a new subtype of some 
existing types. Even relational database systems allow a table to 
be dynamically added or dropped, and an attribute to be added or dropped. 
Since an object-oriented data model admits additional information, 
its DDL should include facilities for making such schema changes as 
adding a method or attribute to an existing type, dropping a method 
or attribute from a type, adding a new supertype to an existing type, 
dropping a supertype from a type, etc.

Deficiencies of Data Control Language

The ODMG-93 database language has (almost) no data control language; 
in particular, it has no provision for access authorization (granting 
and revoking authorizations), and no provision for triggers (for automating 
the enforcement of data integrity with user-specified actions). Again, 
the facility for granting authorization should include consideration 
of the object-oriented data model; for example, authorization to run 
methods,  and authorization to access not just one type, but a type 
hierarchy. 

The ODMG-93 database language proposes a nested transaction model 
(a transaction recursively consisting of other transactions) for transaction 
management. A transaction is simply a sequence of reads and updates 
against a database (any database -- relational or object-oriented). 
The purpose of bundling a sequence of reads and updates into a single 
transaction is simply to define the sequence as an atomic database 
access. Either all the reads and updates in a transaction finish (i.e., 
the effects get recorded permanently in the database), or none finish. 
In this way, the database is not corrupted with data as a result of 
a partial work. A nested transaction model is sometimes desirable, 
since it naturally models a situation in which the work of a transaction 
may be split into multiple transactions (subtransactions) which may 
be executed in parallel. However, this has nothing whatever to do 
with object-orientation (i.e., encapsulation, methods, inheritance); 
that is, object-orientation does not require that transactions be 
nested. In my view, ODMG should adopt the standard non-nested transaction 
model as default, and suggest a nested transaction model as an option.

Potential Meta-Data Management Nightmare

The ODMG-93 database language has various little facilities that will 
add up to a "meta-data management" nightmare. For example, it 
proposes to allow a user to specify multiple names for a single object; 
to specify the lifetime of a single object; to create an index on 
a subset of an extent of a type (i.e., a subset of all instances of 
a type -- equivalent to a subset of all records of a table in a relational 
database); to create multiple such subsets of an extent of a type, 
such that some objects may belong to more than one subset; to name 
a query statement and use it in other queries, etc.

I note that it is of course desirable, conceptually, to allow the 
user to manage the database at the object level, that is, to use a 
single object as the smallest unit of database access and control. 
For example, in a document management application, it is certainly 
desirable to be able to name each document object, attach an authorization 
on the document (although ODMG-93 does not include any consideration 
of authorization), to fetch the document by its name, etc. However, 
it must also be possible to manage the database at the type level, 
that is, to use a type (and its extent) as a unit of database access 
and control -- relational database systems allow this, but do not 
allow each tuple (record or row) of a table as a unit of database 
access and control. Although it is desirable to have a single object 
as a unit of database access and control, it may sometimes lead to 
significant additional difficulties with meta data management. Further, 
it can lead to a performance degradation; a larger system catalog 
to maintain the status of each named object, a larger catalog to keep 
track of access authorizations on individual objects, a more complex 
logic for authorization checking, a more complex logic to screen redundant 
objects when processing queries against multiple subsets of the extent 
of the same type, etc. In summary, it should be recognized that an 
object-level database access and control is merely the next logical 
step from the traditional type-level access and control, rather than 
a replacement of the type-level access and control.

(Although I do not regard this as a "major" problem, the ODMG-93 
database language requires the user to explicitly name the extent 
of a type. This seems to merely add to the meta-data management problem. 
It is not clear at all why the extent of a type cannot be implicit. 
In a relational database, the extent of a table is implicitly all 
the records that are inserted into the table -- the user need not 
declare the extent of a table explicitly and separately.)

C++ and Smalltalk Bindings

Both the C++ and Smalltalk bindings force all objects to be persistent; 
and this is regarded as totally unacceptable to some users and prospective 
users of C++ and Smalltalk. It will be necessary to allow C++ or Smalltalk 
applications to deal with both persistent objects and nonpersistent 
objects. To do this, it will be necessary to allow the users to declare 
persistent classes and persistent attributes, to differentiate them 
from nonpersistent classes and nonpersistent attributes, respectively. 
Further, the issue of storing persistent objects that reference nonpersistent 
objects needs to be addressed.

3. Future Course for ODMG

In order to understand the relative merit of ODMG and ODMG-93 and 
the future course of action that ODMG should take, we must bear in 
mind the fact that relational database vendors are planning to add 
object management facilities to their relational database products, 
that is, extend SQL to object-oriented SQL and that they are all working 
within the ANSI X3H2 (SQL-3) Standards 
Committee. Given that OODB vendors that are members of ODMG have 
elected to adopt SQL as the basis of their object-oriented database 
language, it is clear that ODMG should really join forces with ANSI 
X3H2 to arrive at a single standard. To this end, ODMG may do the 
following.

1. ODMG should start developing levels of conformance and certification 
process if they hope to have their proposals become a standard. The 
current proposal is "all or nothing"; either a vendor must implement 
all aspects of all of the proposals or they do not conform at all.

2. ODMG should quickly proceed to implement a certification process 
for the C++ and Smalltalk language bindings, as these are the most 
mature parts of its current proposal.

3. ODMG should influence SQL-3 with the two primary technical contributions 
of the ODMG-93 database language, namely management of compound data 
and strong typing of query results. 

4. ODMG should make a revised proposal that will fully subsume SQL, 
while preserving its two primary technical contributions. In particular, 
it should simply accept and extend the major missing features from 
SQL, namely views, dynamic schema changes, and access authorization; 
and drop gratuitous differences in syntax and terminology from SQL.

In my view, the first step ODMG may take is to reorganize its data 
model and database language into three tiers. The first tier should 
consist only of those data modeling concepts and language constructs 
that are identical to their counterparts in relational databases. 
The second tier may consist of major new concepts and constructs, 
such as the concepts of relationships between two types, inheritance 
hierarchy, methods, and compound data types. The third tier may consist 
of  the "little" things that can lead to meta-data management 
problems. In practice, the first two tiers should be combined into 
a single tier; but the artificial separate presentations may make 
it clear that an object-oriented model subsumes the relational model. 
In any case, the first two tiers should be the baseline for compliance 
certification for vendors. The third tier should strictly be an option 
for compliance.

Finally, ODMG proposes to expand ODMG-93 with proposals for additional 
database features. Before they proceed, they should first seek to 
broaden its membership to add more experiences in discussions involving 
some of the really thorny issues ODMG claims to have on its agenda. 
The small number of members whose primary expertise is in providing 
persistent storage for C++ should not be so presumptuous as to believe 
that they will define standards for such far-reaching and thorny issues 
as versioning, work-group transactions, multimedia data management, 
etc. These issues are very important to a far wider area of computing 
industry than just persistent storage for C++ or Smalltalk.