Abstract Object Store Models
Back to Description of SBA and SBQL
As we have argued in the section devoted to syntax, semantics and
pragmatics of query languages, we have to define data structures stored in
an object database in an abstract way, but with the algorithmic precision. Our
definition should be sufficiently universal to cover all features that can
occur in object-oriented databases and XML repositories.
However current object models tend to be very complex, with many
non-intuitive notions. Moreover, each object-oriented standard, programming language
or database management system introduces an own object model, with very
specific concepts that sound similar, but frequently have totally incompatible
meaning. This concerns such popular concepts as class, inheritance, interface,
type, and others. Nowadays XML technologies also contribute to this complexity.
Although basically the XML model is not object-oriented, it introduces complex
hierarchical objects, with a lot of own notions, especially in RDF, in XML
Schema, in RDF Schema and in the family of technologies called Web Services.
The ODMG standard for object-oriented databases involves many notions such as
objects, literals, types, sub-types, interfaces, classes, inheritance, methods,
overriding, polymorphism, collections (various kinds), structures,
relationships, operations, exceptions, and others. The SQL-99 standard is even
more complex, because it involves similar concepts and additionally mixes them
with nested relations having a lot of peculiarities, abstract data types
(ADT-s) and other features.
Unfortunately for current technologies (and fortunately for SBA) the
majority of this complexity is caused by secondary features and lack of
attempts to generalize, simplify the ideas, and to avoid redundant notions. For
instance, one can introduce both classes and ADT-s, but conceptually the
notions are the same. Similarly, the ODMG standard introduces both sets and
bags as collections, but sets could be removed as a particular case of bags. If
we assume the object
relativism principle, then there is no need to distinguish objects and
attributes. There are more such redundancies which stem from various streams of
research and development, and some historical dust around the object-oriented
notions that has been collected for years.
Now it is the time for cleaning the dust. For description of semantics
of query languages the complexity of the underlying data model is a very
negative factor. A consequence of the complexity of the object model is the
complexity of a query language concerning its syntax, semantics and pragmatics.
Complexity of semantics implies much more difficult implementation and
optimization. In particular, due to complexity of SQL-99 many professional
doubt if it is entirely implementable. Optimization of queries and programs in
SQL-99 will be very challenging and in many cases impossible due to chaotic
language design decisions and unknown interaction between various data
structures and language’s constructs. Complexity of pragmatics leads to long
documentation, extensive user manuals, long training time, long application
development time and more chances for errors.
Complex semantics is also more difficult for keeping consistency.
According to the conceptual closure
principle, each feature of an object model must be reflected in syntax,
semantics and pragmatics of a language addressing the model. The precise
semantics of the language requires defining all states according to the model
(the set State). The
complexity of the object model causes the complexity of the set State and consequence, the complexity of
definition of semantics. This leads to more difficulties during formal analysis
of the semantics, decreasing the potential for query optimization, much more
challenging strong type checking, and much more difficult the control over
completeness and mutual interaction between different constructs of the
language. A complex object model causes also the “metamodel management
nightmare” (after Won Kim), that can be observed e.g. in the ODMG standard[1].
Currently, the commercial world neglects or ignores the problem of the
complexity of the object model and its influence on the complexity of semantics
and pragmatics. The claims that for SQL-99 or ODMG OQL one can easily build a
formal model are not justified at all; they belong to the marketing offices
liars’ game rather than to honest and technically sound assertions. Languages
are designed with no care about minimality and consistency of introduced
constructs. All the commercial proposals of standards are underspecified. Holes
in the specifications cause that implementations of the proposals cannot be
compatible. These circumstances, together with ad hoc extensions introduced by
software manufacturers, to a big extent undermine the sense of the standards.
For these reasons there is necessity to simplify object models by
developing such abstractions over them that cover all the required features
introduced in practical languages by minimal set of notions. We remind that
just the simplicity of the relational model was the source of its success,
because make it possible to reason about various properties and language
constructs in intuitive and formal ways.
In contrast to the relational model, object models must be more complex
for better conceptual modeling capabilities (what is just the essence of the
object-orientedness). It is difficult to crate a
single model that would be at the same time simple and covering all the
features of object models. There are also some didactic reasons: a lot of
features can be explained on a very simple (but still quite universal) model,
and then, next and next features can be added by generalization of this simple
model. For these reasons SBA is based not on a single object store model, but
on the family of models that are enumerated AS0, AS1, AS2 and AS3[2];
a model with a higher number introduces more sophisticated features. The list
of models is of course open - there are a lot of possibilities to make variants
of them or introduce new features. However, the list is complete in this sense
that - according to the best of our knowledge - there is no feature or notion
of currently used or proposed in practice object languages and systems that
would be not covered by some of these store models. All the store models AS0,
AS1, AS2 and AS3 are based on the same few formal primitives.
The basic features of the introduced store models are the following:
This family of models is rich enough to have a hope that some next
conceptual feature of object-oriented artifacts will not create a new quality
for semantics of defined query/programming languages. The most important is to understand
the assumptions of SBA and the SBQL semantics for the simplest AS0 model. After
that, it is quite easy and natural to extend the semantics to higher-order
store models.
We have to warn the readers that our store models are not the same as
data models. Store models are formal constructs and have almost nothing in
common with concepts, ideological constraints and rhetoric, mathematical
decorations, beliefs and stereotypes that are usually associated with data
models. A store model is simply an abstract view on data structured stored in
the database (and in other media) and is orthogonal to any ideologies such as
the relational model or XML. Lack of a formally precise store model causes that
the definition of semantics will be always vague and obscure; not clear for the
developers and programmers of a query language engine. In our opinion, this is
the case of ODMG and SQL-99 standards, which do not present the formal store
model for objects explicitly, but explain them informally by other (also sometimes
obscure) notions, such as types, classes, ADTs, etc.
In all the introduced store models we use the same three elementary
notions:
Last modified: December 31, 2007
[1] However, we disagree with Won Kim that this
can be an argument of favor of object-relational models. In our opinion, the
SQL-99 object model leads to the much more painful “metamodel management
nightmare”.
[2] Originally, AS0, AS1, AS2 and AS3 were denoted
M0, M1, M2 and M3, correspondingly. The change was caused by a clash with the
notation used by OMG for different UML (meta) models.