Developed at the
ODRA – Object Database for Rapid Application development
Description and Programmer Manual
by Tomasz Kowalski and the ODRA team
18. ODRA Indexing
Indices are auxiliary (redundant) database structures stored at a server. A database administrator manages a pool of indices generating a new index or removing an index depending on the current need. As indices at the end of a book are used for quick page finding, a database index makes quick retrieving objects (or records) matching given criteria possible. Because indices have relatively small size (comparing to a whole database) the gain in performance fully justifies some extra storage space. Due to single aspect search, which allows one for very efficient physical organization and very fast search algorithms, the gain in performance can be even several orders of magnitude. Indices, however, consume some extra time when a database is updated. Hence introducing indices is mainly constrained by the proportion of searches and updates. For ODRA no simple rule for decisions concerning the presence or absence of indices is available; they depend on the experience of a database administrator.
Objects can be indexed using a wide range of selection criteria (i.e. search key). The value of a search key must depend on a current object. It can be:
The last approach enables the administrator to create an index matching exactly predicates within frequently occurring queries, so their evaluation is faster and uses the minimal amount of I/O operations.
Currently the implementation supports indices based on Linear Hashing structures which can be easily extended to its distributed version SDDS in order to optimally utilize data grid computational resources.
18.1.1 Example on indexing
In query optimization indices are used in the context of the where operator, when the left-hand operand is indexed by key values of the right-hand selection predicate. For instance, if the administrator will establish the index named idxEmpSalary returning references to Emp objects depending on their salaries, then the following queries will produce the same result (the second query is generated by the automatic query optimizer).
In case of big databases replacing the where evaluation with an index function call may cause performance gain in orders of magnitude. However to achieve this effect, the database should ensure index transparency and automatic index updating.
18.1.2 Physical Properties of Indices
The idea of indexing implies two important properties:
All indices existing in a database are registered and managed by the ODRA Index Manager. Each index is associated with a module where it has been created. Its name is to be unique. The index manager will be described in detail in a separate technical documentation.
The administrator issues the 'add index' command in CLI to create an index in the database. The syntax of this command is the following:
This is the only command necessary to install new index in the databse. The type indicators are optional and are described more in the further sections.
The next section briefly discusses a creating query structure and current rules and constrains concerning creating single-key indices.
The syntax of command for removing an index from the register is simple:
18.2.1 Creating a single-key index
<creating_query> is an SBQL query, which returns references to objects with associated key values. In order to create an index the administrator must provide the 'add index' command with such a query as a parameter.
Syntax of <creating_query> is the following:
E.g. if such an index is added it is transparently used in optimization of the following queries evaluation:
The current syntax can be changed in the future to be more user friendly (e.g. join can be replaced with parentheses embracing <key_expression>).
Indexed objects are defined by <object_expression> which is to be bound in the database section (root objects). The <object_expression> is to be built using only dot operators and names (i.e., by a path expression).
<key_expression> sub-expressions should be bound in the join operator stack section. They are allowed to return a value of the following types: integer, double, string, reference and boolean. In the simplest case <key_expression> can be an objects attribute, however derived attributes and expressions containing procedure calls are allowed.
An important property of a created index is a cardinality of a key. It indicates the number of key values, which can be returned for the given object. The index optimization is simplest if one key value is always returned for an indexed object (singular cardinality [1..1]). Currently the optimization for keys with a maximal cardinality greater then 1 is not supported. The optional cardinality [0..1] of a key enforces more strict rules for query optimization utilizing index in order to preserve query semantics after optimization. In a following example a key cardinality is optional because manages attribute is optional for Emp objects.
Example queries which can be transparently optimized by applying idxManagerDName index are shown in section 1.3.
18.2.2 Dense, Range and Enum type indicators
The creating index syntax allows the administrator to specify general index key properties, i.e. concerning key values or the optimization goal. These are achieved by introducing optional <type_indicators>: dense, range and enum.
The dense indicator implies that the optimization of selecting queries which use a given key value as a condition will be used only for selection predicates based on '=' or in operators. Therefore the distribution of indexed objects in an index (e.g. in hash table) can be more efficient for optimization of such cases.
The range indicator implies that optimization will concern selection predicates based not only on '=' or in operators but also on range operators: '>', '≥', '<' and '≤'.
E.g. if administrator would like to optimize evaluation of following range queries:
she should issue the command mentioned above, which adds the idxperage index.
The enum indicator was introduced in order to take advantage of keys with countably limited set of distinct values. The performance of an index can be strongly deteriorated if key values have low cardinality e.g. person eye colour, a marriage status (boolean value) or the year of birth. Using the enum key type, an index internally stores all possible key values and uses this information to optimize the index structure.
The enum key type can deal with optimizing selection predicates exactly like in the case of the range indicator, i.e. for: '=', in, '>', '≥', '<' and '≤' operators.
The default type indicator for integer, string, double or reference values is dense. In case of boolean values, the enum type is always used. The dense indicator should always be used for reference values.
The next section describes ODRA's optimization rules which can be helpful in applying good indexing.
Currently the index optimizer analysing the right operand of the where operator takes into consideration all selection predicates joined with an and and or operators.
Building selection criteria with the non [1..1] key cardinalities may cause runtime errors. Selection predicates based on '=', '>', '≥', '<' and '≤' operators force using single values as left and right operands. Unexpected number of operand values causes a runtime error. More operand values are allowed only if the in operator is used as a predicate because it does not constrain the cardinality of a right operand. To enable full optimization of queries with optional cardinality [0..1] suitable predicate based on exists expression should be used.
More examples are available through ODRA SVN repository in batch files stored in the folder “EGB/res/index/batch/”. Batch files include queries and CLI commands and can be executed using following syntax:
Short description of batch files contents:
l res\sampledata\batch\createM0.cli - creates a module with sample set of data necessary to perform batch test's and examples.
l res\index\batch\add.cli – creates a set of indices.
l res\index\batch\remove.cli – removes a set of indices.
l res\index\batch\test-error_idxmgr.cli – test for index manager containing several errors.
l res\index\batch\test-xml.cli – test for optimizing data from XML import (done for medium database).
l res\index\batch\startall.cli – starts all mentioned above tests.
l res\index\batch\start.cli – starts some of tests mentioned above.
Last modified: July 14, 2008