Developed at the
ODRA – Object Database for Rapid Application development
Description and Programmer Manual
by Krzysztof Kaczmarski and the ODRA team
An XML document consists of three basic elements: named nodes, text nodes and node attributes. The example below shows them in a simple XML document.
Named nodes: Person, Info, Name, Surname, Phone, animal
Text nodes: John, Smith, +48-888-88-00-12, Likes, cats, but not, dogs
The above terminology will be used in this document. Please note the following problems with XML structures when imported into an object database and addressed in a query language.
1) Named nodes may be simple, single valued nodes or complex nodes (containing other nodes). If some nodes are optional then the type of a containing node may vary on presence of the optional part. Consider for example the Phone node. If the type attribute is present then phone must be a complex object, it clearly contains two subobjects: type (mobile) and string (+48-888-88-00-12). If type is not present then the phone object is to be perceived as a simple type string object.
2) Similarly an object may be seen as a complex one if an optional subobject is present. For example, if someone does not use animal tags in the info object then the info object may be understood as a simple string object.
3) Objects may contain nameless subobjects. Let us consider Info node. If Info is the name for the whole node than we have no name for “Likes” and “but not” text nodes. Objects having no name cannot be used in queries. Some query languages solves this problem by simply enumerating subobjects.
These problems of unpredictably changing structure of objects may result in queries that will change semantics and results in a way hard to understand.
For example let us consider the following XML input data:
Here, the first Phone object seems to be complex object, while the second one may be seen as a simple object. If the type attribute is optional then the structure and semantics of the Phone objects may change unexpectedly. Another problem appears if someone adds type attribute to the second Phone object. It must become a complex object.
Our XML importer is designed to avoid these problems. If fact, there are two problems to be solved when dealing with XML input: attributes and nameless object content. The XML import procedure offers two ways of solving these problems.
12.1.1 Naming the nameless content
As it was shown in the previous section, sometimes XML tags may contain data that is nameless from the object database’s point of view. What’s even more confusing sometimes character of data may change when container object changes but the data itself is not touched.
Let us recall the previous example:
The phone element may be seen as a simple string object named Phone.
But, if someone alters the Phone tag by adding an attribute:
then phone becomes a complex object because it has two properties: type and nameless value. Now a user must access the same object in different way. Therefore XML Importer always treats all objects as complex objects. Simple type values are always stored in subobjects named _VALUE.
So the phone element before the modification is perceived as if it contains another element named _VALUE:
The second one looks similarly:
In both cases a user has got the same access path to the phone number value:
12.1.2 Accessing attributes
The main problem with the elements’ attributes is that they must be properly treated when exporting XML data. If an object has been created from an attribute it should also result in the attribute creation when exported to XML. Hence those objects must be distinguished from normal objects. XML Importer uses two independently exclusive solutions to this problem: addition of ‘@’ prefix to an object’s name and attaching an annotation to imported attribute object.
12.1.3 ‘@’ attribute prefix
In this case the phone object from the previous section will be imported as if it looks like this:
Please note that now a user must use name @type to access the attribute subobject value while accessing the main value of the object remains the same.
This kind of import procedure may be executes using the M0 importer option. Please refer to importing XML using CLI command line section.
12.1.4 Attribute annotations
Another way to mark object coming from attributes is to attach an annotation. Annotations are hidden from the user and may be used and created only by the system. However, they are recognized when an object is exported to XML and the expected XML data format is produced.
An example from the previous section is loaded as if it looks like this:
You may observe that all objects are treated in the same way. All simple type values are accessed using the same uniform construct:
Importing XML with annotations is the default option for the importer when executed in the CLI command line. Please refer to the proper section for more details.
12.1.5 XML namespaces
XML elements may be equipped with namespace information. Generally it means that an element’s name is preceded by a namespace:
Here Phone is the name of the object while addr is the namespace information. Namespaces must be properly declared before use:
Here, xmlns:addr=”http://www.company.com/addressbook” is a namespace declaration. Please refer to W3C XML specification for detailed information about namespaces.
The XML Importer can handle nemespaces automatically by annotating imported objects. For each namespace declaration and usage an annotation is created. Please refer to XML import procedure section for more details.
Since a user cannot see annotations, namespace information so far remains invisible, that is, it cannot be accessed using the query language. However, after exporting namespaces may appear in the resulting XML again.
The general structure of XML importer execution command is as follows:
load “resource” using XMLImporter(“params”)
load “resource” using XMLImporter
resource – a path to a XML file that is to be imported
params – a list of parameters recognized by the importer, separated by: “[space] , ; \n \t \r \f ”.
When no parameters are specified then the import procedure assumes that annotations must be used for marking attributes and namespace information, simple type value must be automatically guessed and references between object using id, idref attribute pairs.
12.2.1 XML importer parameters
Currently the list of implemented XML importer parameters contains:
– M0 – do not use annotated object during import procedure (contrary to useMetabase). See import procedure description for details. By default this option is not used so it must be stated explicitly when needed.
– noGuessType – do not perform automatic type guessing (contrary to useMetabase, using metabase will always use explicit type information). See automatic type guessing for details. By default this option is not used, so simple types are guessed.
– noAutoRefs – do not perform automatic id/idref recognition and reference object creation. See automatic references for details. By default this option is not used so references between objects are created automatically.
– useMetabase – import XML using type information from metabase (contrary to M0, using metabase for type inferring will always use annotated objects). Metabase may be created in any way but in most cases will be constructed by importing XSD file.
The import procedure is able to deal with complex objects and attributes, resolve idref and id attribute pairs to create references and import namespace information. Generally when an XML document is imported into an ODRA object store, all information found in XML is converted to appropriate ODRA objects.
12.3.1 Complex structures and attributes.
Complex XML structures, simple values and attributes are imported according to the following rules:
1. A tagged element is converted to a complex object. A tag name is used as the object's name.
2. A text inside a tagged element is stored in a simple type object named _VALUE. Type may be guessed or taken from a metabase.
3. Element's attributes are stored in subobjects:
1. In case of a simple import procedure, a created subobject is a simple type object with name preceded by ‘@’. Its type may be guessed automatically. Please note that this way of attribute importing will work only when the parameter M0 in XMLImporter is used. Character ‘@’ distinguishes attribute and normal objects.
2. In case of the import using annotated objects, the created subobject is a complex object containing single simple type object named _VALUE. The subobject's name is equal to the attribute's name but an appropriate annotation is created (attribute=”true”). In this way attributes are treated in the same way as all other objects. The annotation is the only way to distinguish non-attribute and attribute objects.
12.3.2 Type guessing.
For some purposes (mainly comparing values or selecting a minimal value) the XML importing procedure tries to guess the type of imported simple type value. If it is a parseable double then a double object is produced. If it is a parseable integer then an integer object is produced. Otherwise a string object is produced. Please note that this option does not use any kind of schema. For example, if an XML file contains:
then after the XML import with automatic type guessing avg will be a double object, count will be an integer object and descr will be a string object.
Type guessing may be switched off by “noGuessType” plugin option. Type may be also assigned to an object using metabase entries. Please refer to the proper section for more information.
12.3.3 Automatic references between elements.
XML Importer may automatically create reference objects using the following algorithm:
Automatic creation of reference objects upon id/idref attributes may be turned off by import parameter “noAutoRefs”.
Namespace information may be also imported but please note that SBQL has no constructs to access those information right now. However it will be visible when an object with namespace information will be produces as a query result. In case of a simple import procedure (M0) all namespace declarations and prefixes are omitted. One must use annotating import procedure to handle namespaces correctly, since namespaces are converted to annotation objects:
1. namespace definition is converted to an annotation object: namespaceDef( prefix:String, uri:String )
2. a single object may have many namespaceDef annotations;
3. namespace assignment creates a reference annotation namespaceRef pointing to an appropriate namespaceDef object;
4. an object may contain only one namespaceRef annotation;
5. if an object is assigned to a namespace it must contain namespaceRef annotation, even if it points to its own namespaceDef;
6. attributes may contain only single namespaceRef annotation.
12.3.5 Type inferring using metabase objects.
XML Importer may use type information taken from objects in metabase. In such a case, simple objects and attributes will not be imported as strings nor any type guessing will be done. Structure of imported XML objects must exactly reflect structures described in the metabase. Type assignment do XML object may be done in two (alternative) ways:
1. by name of an XML object, which must be exactly the same as name of declared variable, structure or typedef existing in metabase
2. by type attribute, which points to metaobject with name equal to value of the attribute. Type declaration attribute must be assigned to namespace http://www.w3.org/2001/XMLSchema-instance (other namespaces, also undefined namespace will result in omitting the type attribute).
Please note that the second case makes sense only for simple type objects because name of the XML object must exactly fit name of variable declaration (in case of a root object) or structure field's name declaration in (case of an object embedded in other object). Otherwise type checking will fail.
The above XML and XSD will create an object named shipTo with type referring to UKAddress typedef but the metabase will also declare variable shipTo with the same type.
Importing XML schema and XML commands using types taken from the metabase:
If one wants to infer type upon information from the matabase, XML file must be imported with annotations, thus M0 option is forbidden.
12.4.1 Example 1 – Books and Authors
The XML file contains information about books and authors. Each book has a title, possibly many authors, a publisher, a price, and optionally an editor.
Let us assume that the file named bib.xml contains:
It may be loaded using CLI command:
The second XML file contains books’ reviews (reviews.xml):
It may be loaded using CLI command:
Here are the valid queries and corresponding results:
1. List books published by Addison-Wesley after 1991, including their year and title.
2. For each book in the bibliography, list the title and authors, grouped inside a "result" element.
3. Create a flat list of all the title-author pairs, with each pair enclosed in a "result" element.
4. For each author in the bibliography, list the author's name and the titles of all books by that author, grouped inside a "result" element.
5. For each book found at both bib.xml and reviews.xml, list the title of the book and its price from each source. We assume that the files have been loaded as shown previously.
6. For each book that has at least one author, list the title, one or two first authors, and an empty "et-al" element if the book has additional authors.
7. List the titles and years of all books published by Addison-Wesley
Let us assume that the file contains:
It may be loaded using CLI command:
SBQL query solving the task:
9. For each book with an author, return the book with its title and authors. For each book with an editor, return a reference with the book title and the editor's affiliation.
10. Find pairs of books that have different titles but the same set of authors (possibly in a different order).
12.4.2 Example 2 – Departments and Employees
The XML file contains two kinds of objects: employees and departments. An employee may contain reference to a department he works in and optionally to a department he manages. Each department contains bidirectional references to employed employees and to the boss.
Let us assume that the file is named deptemp.xml:
Now, the file may be loaded using the following CLI command:
Here are the valid queries that may be executed:
1. Get departments together with the average salaries of their employees:
2. Get name and department name for employees earning less than 2222
3. Get names of employees working for the department managed by Bert.
4. Get the name of Poes boss:
5. Names and cities of employees working in departments managed by Bert:
6.Get the minimal, average and maximal number of employees in departments:
7. For each department get its name and the sum of salaries of employees being not bosses:
8. Is it true that each department employs an employee earning the same as his/her boss?:
9. For each employee get the message containing his/her name and the percent of the annual budget of his/her department that is consumed by his/her monthly salary:
10. Get cities hosting all departments:
Last modified: June 20, 2008