Developed at the
ODRA – Object Database for Rapid Application development
Description and Programmer Manual
ODRA is an object-oriented database management system developed in Java from scratch. No foreign database server is involved. This decision was conscious, after our negative experience of building an object-oriented DBMS on top of relational DBMS-s. The main goal of the ODRA project is to develop new paradigms of database application development. We are going to reach this goal by increasing the level of abstraction at which the programmer works. To this end we introduce a new, universal, declarative programming language, together with its distributed, database-oriented and object-oriented execution environment. Such an approach provides functionality common to the variety of popular technologies (such as relational/object databases, several types of middleware, general purpose programming languages and their execution environments) in a single universal, easy to learn, interoperable and effective to use application programming environment.
The principle ideas which we are implementing in order to achieve this goal are the following:
· Object-oriented design. We are close as far as possible to the UML object model, supporting in this way a seamless transition from analysis, modeling and design phases to the software implementation and maintenance phases. To this end we have introduced for database programming all the popular object-oriented mechanisms (objects, associations, classes, types, inheritance, polymorphism, encapsulation, collections), as well as some new mechanisms (such as virtual updatable views).
· Powerful query language extended to a universal programming language. The most important feature of ODRA is SBQL (Stack-Based Query Language), an object-oriented query and programming language. SBQL differs from programming languages and from well-known query languages, because it is a query language with the full computational power of programming languages. SBQL alone makes it possible to create fully-fledged database-oriented applications. The possibility to use the same very-high-level language for most database application development tasks may greatly improve programmers’ efficiency, as well as software stability, performance and maintenance potential.
· Virtual repository as a middleware. In a networked environment it is possible to connect several hosts running ODRA. All systems tied in this manner can share resources in a heterogeneous and dynamically changing, but reliable and secure environment. Our approach to distributed computing is based on object-oriented virtual updatable database views. This technology can be perceived as contribution to distributed databases, Enterprise Application Integration (EAI), Grid Computing and Peer-To-Peer networks.
Correspondingly, ODRA consists of three closely integrated components:
· Object Database Management System (ODMS)
· Compiler and interpreter for object-oriented query programming language SBQL
· Middleware with distributed communication facilities based on the distributed databases technologies.
The system is additionally equipped with a set of tools for integrating heterogeneous legacy data sources. The continuously extended toolset includes importers (filters) and/or wrappers to XML, RDF, relational data, web services, etc.
Each installation of ODRA can work as a client and as a server; multiple-client – multiple-server architectures are possible. ODRA makes it possible to create multi-layered architectures, where some client is considered a server for lower-level clients. ODRA has all chances to achieve high availability and high scalability because it is a main memory database system with memory mapping files and makes no limitations concerning the number of servers working in parallel. In ODRA we have implemented many advanced optimization methods that improve the overall performance without compromising universality and genericity of programming interfaces.
2.2.1 Reference Architecture of ODRA
There are several views on the internal architecture of ODRA. Understanding internal organization of the system can be helpful in understanding of functionalities and modes of using ODRA in applications. In Figure 2.1 we present a view on the architecture, which involves data structures (figures with dashed lines) and program modules (grey boxes). The architecture takes into account the subdivision of the storage and processing between client and server, strong typing and query optimization (by rewriting and by indices). The subdivision on client and server is only for easier explanation; actually, each ODRA installation can work as a client and as a server. Many clients can be connected to a server and a client can be connected to many servers. Some architectural components and relationships between components are not reflected in this figure.
2-1. Architecture of ODRA
Below we present a short description of the architecture elements presented in Fig.2-1. On the side of the client application we see the following elements.
· A source code of a query/program is created within the Integrated Development Environment, which includes an editor, a debugger, storage of source programs, storage of compiled programs, etc.
· A lexer and parser takes a query/program source as input, makes syntactic analysis and returns an Abstract Syntactic Tree (AST) of the query or program..
· A query/program AST is a data structure which keeps the abstract syntax in a well-structured form, allowing for easy manipulation (e.g. inserting new nodes or subtrees, moving some subtree to another part of the tree, removing some subtrees, etc.). Each node of the tree contains a free space for writing various query optimization information. The syntactic tree is the subject of several operations, in particular, strong type checking, optimization by rewriting, optimization by indices and finally, compilation to a bytecode.
· The strong type checker takes a query/program syntactic tree and checks if it conforms to the declared types. Types are recorded within a client local metabase and within the metabase of persistent objects that is kept on the server. The metabases contain information from declarations of volatile object types (that are a part of source programs) and from a database schema. The module that organizes the metabases is not shown. The strong type checker uses two stacks, static ENVS (keeping signatures of runtime environments) and static QRES (keeping signatures of query results). The strong static type checker simulates actual execution of a query during compile time. The type checker has several other functions. In particular, it changes the query syntactic tree by introducing new nodes for automatic dereferences, automatic coercions, for typing literals, for resolving elliptic queries and for dynamic type checks (if static checks are impossible). The type checker introduces additional information to the nodes of the query syntactic tree that is necessary further for query optimization.
· Static ENVS - static environment stack. It is a compile time counterpart of the environment stack (call stack) known from almost all programming languages.
· Static QRES - static result stack. It is a compile time counterpart of the result stack (arithmetic stack) known from almost all programming languages.
· Local metabase. It is a data structure containing information of types and specifcations introduced in source programs.
· Optimization by rewriting - this is a program module that changes the syntactic tree that is already annotated by the strong type checker. There are several rewriting methods that are developed for SBA, in particular:
§ Performing calculations on literals.
§ Changing the order of execution of algebraic operators.
§ Application of the query modification technique, which changes invocations of views into view bodies. To this end, the optimization module refers to the register of views that is kept on the server.
§ Removing dead subqueries, i.e. subqueries that do not influence the final query result.
§ Factoring out independent subqueries: subqueries whose results are not changed within some loop are factored out outside the loop.
§ Shifting conditions as close as possible to the proper operator, e.g. shifting selection condition before a join.
§ Methods based on the distributivity property of some query operators.
Other rewriting methods are currently under investigation.
On the side of the database server we have the following architectural elements:
This view on the ODRA architecture is to be augmented by new architectural elements, e.g. a cost-based query optimizer, user sub-schemas, distributed query optimizer and others. Some of them are or will be the subject of further research and development within the Polish-Japanese Institute of Information Technology.
2.2.2 Reference Architecture of Applications based on ODRA
There are a lot of various software architectures that can be developed on the ODRA system. ODRA is a combination of object-oriented database management system with own query and programming language, virtual updateable views, stored procedures, stored classes and methods and with many interoperability modules that can be used in different configurations, depending on needs of a particular business application. In Fig.2-2 we present some architectural variant for a Virtual Repository that we can be developed for eGovernment applications. The picture presents some possible configuration of developed software units. Many other architectural combinations are possible, depending on the particular eGov-Bus application in question.
The Virtual Repository work package (WP5) provides the development of a scalable and secure access mechanism for combining data and services supplied by different kinds of information sources through the eGov-Bus, including Web pages, XML files, Web services, relational databases, etc. The system will provide access to semantic Web (RDF) representation of the data, allowing the users for efficient management of ontological data for future eGovernment applications.
A central part of the architecture consists of ODRA, an object-oriented DBMS. Existing resources (bottom of the figure) are extended by wrappers and contributory views (or importers/exporters) that convert data/services proprietary to particular existing applications into the format acceptable for ODRA. The application developers can install as many ODRA servers as necessary, addressing the same distributed sources. The integration view on an ODRA server allows for the virtual integration of data and services supplied by distributed sources, supporting data and function abstractions. The virtual repository front-end will provide various APIs to access virtually integrated data, including workflow applications, Java applications, Web services applications, and others. A particular user works with his/her own client view that is a tailored part of the entire virtual repository schema.
Among many other functions, the virtual repository will allow for transparent access to external information resources and for unlimited transformations of complex document structures.
Features of a virtual repository for particular eGov applications must be edited, compiled, stored, tested, debugged, administered and maintained. All these functionalities are available through Integrated Development Environment, which is based on the jEdit editor for Java. ODRA IDE will be extended to achieve the functionality of fully-fledged administration, performance tuning and optimization module, which will include granting access privileges for users and for particular virtual resources, creating/removing indices, import/export of files (e.g. XML), determining modes of execution and output, etc. Despite many already developed functions, ODRA IDE is still under development.
Full description of ODRA IDE for Java programmers is the subject of another document, ODRA-IDE API Specification (070621 ODRA-IDE API Specification.doc).
For the design of the virtual repository software we have assumed an UML-like object model. Because in general UML is designed for modeling rather than for programming we have made several changes to the UML object model that still do not undermine seamless transition from a UML class diagram to an ODRA database schema. Because the UML object model is richer than XML hierarchical model, the ODRA object model covers also XML (except some of its minor features that are not supported). In the same way it covers a lot of other models, including the RDF model, the Topic Maps model, etc. The ODRA object model covers also the relational model as a particular case; this feature is essential for making wrappers to external sources stored in relational databases.
Objects. The basic concept of the ODRA database model is object. It is an encapsulated data structure storing some consistent bulk of information that can be manipulated as a whole. The UML literature presents a lot of examples of objects. Frequently, objects are machine counterparts of objects from the business domain of applications, but there is no strong rule. A database designer and programmers can create database and programming objects according to their own needs and concepts. Objects can be organized as hierarchical data structures, with attributes, sub-attributes, etc.; the number of object hierarchy levels is unlimited. Any component of an object is considered an object too.
Any objects has an external name (or more names) that can be used by the programmer to identify (to bind) the object from a source query or program. External names need not be unique. A name (such as Account, Invoice, DateOfBirth, SocialSecNbr, etc.) usually bears some conceptual meaning in the business domain. Any object has also an internal identifier that is used internally as a reference. Internal identifiers are unique for the given environment of objects. Internal identifiers are non-printable and have no meaning in the business domain. The programmer never uses internal identifiers explicitly. The intention of internal identifiers is twofold: (1) to enable fast access to an object if its identifier is known, and (2) to use identifiers in various contexts requiring references to objects; for instance, for updating operations, for call-by-reference parameter passing method, etc. Fig.2-3 presents an example Emp object that consists of attributes fName, lNname, birthYear, sex, address, eNbr, job, sal, and pointer links worksIn and manages: attribute address has sub-atributes city, street and houseNbr. Each object, attribute, subobject, pointer, etc. has a unique internal identifier (i9, i10,…, i22).
Collections. Objects within a collection have the same name; the name is the only indicator that they belong to the same collection. Usually objects from a collection have the same type, but this requirement is relaxed for some kinds of heterogeneous collections. Collections can be nested within objects with no limits; in this way we can represent repeating attributes.
Links. Objects can be connected by pointer links. Pointer links represent the notion that is known from UML as association. Pointer links support only binary associations; associations with higher arity and/or with association classes are to be represented as objects and some set of binary associations. This is a minor limitation in comparison to UML class diagrams, introduced to simplify the programming interface. Pointer links can be organized into bidirectional pointers enabling navigation in both directions. If a bidirectional link connects objects A and B, then it is understood as a pointer from A to B and a pointer from B to A. Such bidirectional links behave consistently as twin interrelated pointers: updating of one of them causes immediate and automatic updating of its twin.
Modules. In ODRA the basic unit of database organization is a module. As in popular object-oriented languages, a module is a separate system component. An ODRA module groups a set of database objects and compiled programs and can be a base for reuse and separation of programmers workspaces. From the technical point of view and of the assumed object relativism principle modules can be perceived as special purpose complex objects that store data and metadata.
Types, classes and schemata. A class is a programming abstraction that stores invariant properties of objects, in particular, its type, some behavior (methods, operations) and (optionally) an object name. A class has some number of member objects. During processing of a member object the programmer can use all properties stored within its class. The model introduces atomic types (integer, real, string, date, boolean) that are known from other programming languages. Further atomic types are considered, but not implemented yet. The programmer can also define his/her own complex types. Collection types are specified by cardinality numbers, for instance, [0..*], [1..*], [0..1], etc. Classes can be connected into an ODRA schema, as shown in Fig.2-4.
Inheritance and polymorphism. As in the UML object model, classes inherit properties of their superclasses. Multiple inheritance is allowed, but name conflicts are not automatically resolved. The methods from a class hierarchy can be overridden. An abstract method can be instantiated differently in different specialized classes (due to late binding); this feature is known as polymorphism.
Persistence and object-oriented principles. The model follows the orthogonal persistence principle, i.e. a member of any class can be persistent or volatile. Shared server objects are considered persistent, however, non-shared objects of a particular applications can be persistent too. The model follows the classical compositionality, substitutability and open-close principles assumed by majority of object-oriented programming languages. Shared (server) objects are the subject of transactional ACID semantics based on the 2PL algorithm.
Distinction between proper data and metadata (ontology) is not the property of the ODRA database model. The distinction can be important on the business model level, but from the point of view of ODRA both kinds of resources are treated uniformly.
SBQL (Stack-Based Query Language) is a powerful query and programming language addressing the object model described above. To the best of our knowledge, SBQL is the most powerful query language for object-oriented and XML-oriented models, much more powerful than ODMG OQL, OMG OCL and W3C XQuery standards. Some current limitations of SBQL are caused by static strong typing (that is underdeveloped e.g. in XQuery) and lack of reflexive capabilities (that are under development). SBQL implemented for the eGov Bus project is significantly more powerful than previous SBQL implementations. The power of SBQL concerns a wide spectrum of data structures that it is able to serve and complete algorithmic power of querying and manipulation capabilities.
SBQL is precise with respect to the specification of semantics. SBQL has also been carefully designed from the pragmatic (practical) point of view. The pragmatic quality of SBQL is achieved by orthogonality of introduced data/object constructors, orthogonality of all the language constructs, object relativism, orthogonal persistence, typing safety, introducing all the classical and some new programming abstractions (procedures, functions, modules, types, classes, methods, views, etc.) and following commonly accepted programming languages’ and software engineering principles, including orthogonality (keep unrelated features unrelated), compositionality (avoid big syntactic and semantic patterns, as well as far context dependencies in a program code), universality (the language should cover the assumed domain), generality (using language features for many purposes), parsimony (avoid redundant features), clean formal semantics, openness (use external systems and specialized tools), no semantic anomalies (no exceptional features and irregular treatment), no semantic reefs (programmer’s understanding and machine processing coincide), correspondence (the methods of binding do not depend on a context), conceptual closure (introducing a feature A enforces next features that appear from the combination of A with already existing features), safety (typechecking, assertions, constraints), semantic relativity (identical properties of parent and nested entities), conceptual continuation (bigger tasks are to be smooth extensions of smaller tasks).
SBQL queries can be embedded within statements that can change the database or program state. We follow the state-of-the-art known from majority of programming languages. Typical imperative constructs are creating a new object, deleting an object, assigning new value to an object (updating) and inserting an object into another object. We also introduce typical control and loop statements such as if…then…else…, while loops, for and for each iterators, and others. Some peculiarities are implied by queries that may return collections; thus there are possibilities to generalize imperative constructs according to this new feature.
SBQL for the eGov Bus project introduces also procedures, functions and methods. All procedural abstractions of SBQL can be invoked from any procedural abstractions with no limitations and can be recursive. SBQL programming abstractions deal with parameters being any queries; thus corresponding parameter passing methods are generalized to take collections into account. We have implemented the strict-call-by-value method which makes it possible to achieve the effects of call-by-value, call-by-reference, and more. Transactions are also considered procedural abstractions, syntactically and semantically very similar to procedures. Nested transactions are supported.
SBQL is a strongly typed language. Each database and program entity has to be associated with a type. However, types do not constraint semi-structured nature of the data. In particular, types allow for optional elements (similar to null values known from relational systems, but with different semantics) and collections with arbitrary cardinality constraints. Strong typing of SBQL is a prerequisite for developing powerful query optimization methods based on query rewriting and on indices.
For SBQL we have implemented a generic gateway to Java libraries. This facility allows one to use calls to Java programs within SBQL programs. The facility is especially useful to extend SBQL with GUI, with string operators, with J2EE capabilities, etc.
Below we present some examples of SBQL queries (cf. Fig.2-4), just to make some impression on the language.
Get all information on departments for employees named Doe:
Get the name of Doe’s boss:
Names and cities of employees working in departments managed by Kim:
For each employee get the name and the percent of the annual budget of his/her department that is consumed by his/her monthly salary:
For each person having no salary give the minimal salary in his/her department:
Example of a method: it is a part of the Emp class. The method gives a new salary to an employee. If an employee has not the sal attribute, the method inserts it (operator :<). The method returns false (and does nothing) if the salary is to be decreased; otherwise it returns true.
Virtual views (known from SQL) are frequently considered as a tool for adapting heterogeneous data to some common schema assumed by the business model of an application. Unfortunately, SQL views (practically, the only kind of views that are in use today) have limitations that restrict their application in this role:
During our work on the Virtual Repository Management System for the eGov Bus project we have assumed that it should be based on virtual views. To this end, we have to investigate the problem how to overcome the limitations of SQL views. In effect, we have developed a new method that allows us to achieve the power of updateable views that has not been even considered so far in the database domain. Our method has some commonalities with the instead of trigger views implemented in Oracle, SQL Server and DB2, but it is based on different principles, is much more powerful and efficient, and may address any object-oriented (or XML-oriented) database model.
Classical SQL views do the mapping from stored data into virtual data. However, some applications may require updating of virtual data; hence there is a need for a reverse mapping: updates of virtual data are to be mapped into updates of stored data. This leads to the well-known view updating problem: updates of virtual data can be accomplished by updating of stored data on many ways, but the system cannot decide which of them is to be chosen. In typical solutions these updates are made by side effects of view invocations (as presented, e.g., in the Oracle solution). Due to the view updating problem, many kinds of view updates are limited or forbidden.
We take another point of view. In general, our method is based on overloading generic updating operations (create, delete, update, insert, etc.) acting on virtual objects by invocation of procedures that are written by the view definer. The procedures are an inherent part of the view definition. The procedures have full algorithmic power, thus there are no limitations concerning the mapping of view updates into updates of stored data. SBQL updatable views allow one to achieve full transparency of virtual objects: they cannot be distinguished from stored objects by any programming option. This feature is very important for distributed and heterogeneous databases.
SBQL views present a new method that attempts to achieve two qualities: high-level view definition, as in SQL views, and full algorithmic power (including updating) as e.g. in OMG CORBA adapters or wrappers.
As a simple example of a virtual updateable view we present the view (c.f. Fig.2-4) that delivers virtual objects named EmpBoss that contain virtual attributes name (of an employee) and bossName (of his/her boss). The updating may concern the name of a boss, which means that the corresponding employee is moved to the department managed by this new boss.
Application of the view: let Doe start to work for Lee’s department (accomplished simply by the assignment to bossName):
Note that the assignment := is overloaded by the invocation of the procedure on_update from the view definition. String “Lee” is passed as the newBoss parameter to this procedure.
SBQL views can have an own persistent state, what is important for various mappings based on dictionaries or ontologies (they can be kept inside views). Because of the full algorithmic power concerning both the mapping of stored objects into virtual ones and the mapping of updates of virtual objects into updates of stored ones, SBQL views can be used for quite a lot of applications. In particular, SBQL views can be used as mediators on top of local resources to convert them virtually to the required format, and as customizers that adopt the data to the needs of a particular end user application. As customizers, SBQL views offer for a database administrator additional facilities for granting access privileges.
SBQL views are also used as integrators that fuse data from different sources. In this role SBQL views are the foundation of the Virtual Repository Management System that is the subject of the eGov Bus project. Procedures that are used to define virtual objects can involve elements of communication and transportation protocols (e.g., based on Web Services), thus can be used to:
In this way SBQL views allow one to achieve a lot of transparencies that are considered in distributed/federated databases, including location and access transparency, heterogeneity transparency, replication transparency, fragmentation transparency, redundancy transparency, optimization transparency, etc. An important application of SBQL views concerns wrappers that adopt relational database to an assumed object-oriented database model.
SBQL views are optimized by several methods. Queries within views are optimized by rewriting and by indices. Queries with view invocations are optimized by the query modification technique. New optimization techniques are considered, in particular, query tail absorption, query caching and global indexing of virtual objects.
2.7.1 Wrappers to external relational databases
Integration of many servers participating in an eGovernment application requires different design processes in comparison to the situation when e.g. one object-oriented application is to be connected to a relational database. The common (canonical) database schema is the result of negotiations and tradeoffs between business or administrative partners having incompatible (heterogeneous) data and services. The processes may take into account data models of the resources, but first of all the global canonical schema is influenced by the business model of future global applications.
This makes development of an object-relational wrapper more constrained than in a classical Object-Relational Mapping (ORM) case known e.g. from Java or .Net oriented wrappers (such as JDO, EJB, TopLink and Hibernate). The wrapper should deliver virtual objects and services according to the predefined object-oriented canonical schema. There could be little freedom or could be incovenient to change the canonical schema due to limitations of ORM capabilities. Moreover, the mapping to an UML-like object model is much more complex problem than mapping to the Java or C# object model. The mapping should also support both directions: mapping of relational data into virtual objects, and mapping of updating operations on virtual objects into updates of relational tables through SQL. Hence, the architecture and algorithms for ORM aiming at eGovernment applications must be developed with proper universality of the mappings in minds.
The major problem with this architecture concerns how to utilize an SQL optimizer. In all known RDBMS-s the optimizer and its particular structures (e.g. indices) are transparent to the SQL users. A naive implementation of the wrapper causes that it generates primitive SQL queries such as select * from R, and then, processes the results of such queries by SQL cursors. Hence the SQL optimizer has no chances to work. Our experience has shown that direct (static, i.e. compile time) translation of object-oriented queries into SQL is unfeasible even for a typical case.
The solution of this problem that
we have implemented in VRMS is based on the object-oriented query language
SBQL, virtual object-oriented views defined in SBQL, query modification
technique, and an architecture that will be able to detect in a query
syntactic tree some patterns that can be directly mapped as optimizable SQL
queries. The patterns match typical optimization methods that are used by the
SQL query optimizer, in particular, rewriting, indices and fast joins. The
idea is fully implemented within the ODRA prototype, including not only
retrieval, but also some updating statements. The currently supported RDBMS
are: Axion, Cloudscape, DB2, DB2/AS400,
The idea of this wrapper is that a relational database is treated as a primitive object-oriented database, where each tuple is considered an object. Then, on such a primitive object-oriented database we are defining virtual views that map it to the given object-oriented model that is assumed by the canonical schema. Note that SBQL views (unlike SQL) have full algorithmic power. SBQL queries invoke these views; thus the relational database structure is fully transparent for its users. Due to the power of SBQL views, any complex mapping between a relational schema and an object-oriented canonical schema is feasible. The processing of SBQL queries is done by the following steps:
Benchmarks have shown that this algorithm behaves quite well and is able to utilize almost all native SQL optimization methods.
2.7.2 XML importer/exporter
The XML importer/exporter is implemented in such a way that no information contained in the original XML file is lost in its ODRA representation. In some cases this assumption implies inconveniencies in asking SBQL queries – they are a bit more complex. In particular, due to XML attributes each atomic value from an XML file is wrapped into an object named _VALUE. This implies that an SBQL reference to such a atomic value must be ended by dot and _VALUE; see examples. Note also that XML attributes id and idref are mapped as pointer links between objects, giving the possibility to use SBQL path expressions.
XML exporter is implemented as a generic utility having an SBQL query as a parameter. XML tags are deduced from ODRA objects returned by a query and/or from auxiliary names that are used within the query. Because SBQL queries have full algorithmic power, the XML importer/exporter has no limitations concerning transformation of XML files into XML (or other) files. The XML importer/exporter can also invoke SBQL views as well as any SBQL functions. Below we present an example XML file, an SBQL query, and then the XML file generated by this query.
Input XML file:
For each interval <n,n+999>, n = 0, 1000, 2000, 3000, ... get the message (string) containing the number of employees having the salary within this interval and the interval itself. Output messages should have proper spelling (nouns with -s for plurals, verbs without -s for plurals).
The XML file generated by ODRA in response to the above query:
The example shows the computation power of SBQL as an XML mapping language. In contrast to such tools as XSLT, the computationally complete power of SBQL makes no limitations concerning the mapping; providing the algorithm or the rule for the mapping is known, it can be coded in SBQL in a declarative or procedural way.
The XML importer/exporter can also import XSchema (XSD) file to the ODRA metamodel. In this way all SBQL queries addressing XML files can be strongly typechecked.
Currently we provided several methods to access Virtual Repository resources from external applications (including APG). All the methods use SBQL.
The above interfaces are implemented in
prototype versions and will be extended according to the need of eGovernment
applications. Implementation of other interfaces are considered, e.g.
Last modified: June 17, 2008