Developed at the
ODRA – Object Database for Rapid Application development
Description and Programmer Manual
by Jan Murlewski and the ODRA team
17. ODRA Distributed Database
ODRA provides advanced mechanisms for creating distributed databases based on the parallel architecture. Such an approach allows for creating a virtual repository for transparent processing of distributed ODRA objects. Furthermore, an ODRA-based distributed database is capable of handling simultaneous access and modifications of data among connected nodes.
Each node of an ODRA database in a distributed environment can act as a client, a server, or both, depending on the requirements, as shown in Fig.17-1.
17-1. Distributed database
Whenever a remote query is being executed, a client sends a request containing a textual form of a query to a server. After receiving the query by the server, the query is locally executed and its result is sent back to the client. The request may also contain additional parameters for a query or for a remote procedure call.
17-2. Distributed database
After receiving the result of a query, remote objects references are pushed to the client environment stack and are implicitly converted into remote object identifiers. Thus, references to the original objects in a distributed database can be retrieved for updates or lazy loading. The concept of a remote object identifier allows for unique object identification and tracking in a distributed environment. In this way, updates and remote procedure calls on a remote database objects are permitted by transparent forwarding operations to original objects in a distributed environment. This approach has benefit of simplicity and flexibility. Thus, objects in the distributed environment can be transparently queried and updated through SBQL with no regard to data location.
The key concept in ODRA distributed database systems is a database link. The main idea behind database links it to provide simple and transparent access to data and services offered by distributed databases. The database link is represented as a connection between two ODRA-based databases. The main advantage behind the database link is possibility to take the advantage of SBQL approach, while hiding the underlying network and communication infrastructure. Moreover, database users do not have to know physical locations of distributed databases. Thus, the database link mechanism is responsible for making requests to an appropriate server, while clients are using logical names instead of IP addresses, port numbers, etc.
The database link connection is only one-way in a sense it allows only one side to sent requests, while the other side must acts as a client. In order to create fully bidirectional communication between distributed databases it is necessary to create database links on both sides.
The database link is using for communication a TCP/IP protocol, which ensures reliability of data transfers. However, a link manager module is responsible for managing connections with remote nodes. Whenever an established connection between two databases is lost, the link manager will attempt to establish a new connection automatically. Thus, the database user does not have to be responsible for establishing and closing connections explicitly. This capability greatly reduces the complexity and enhances software maintainability.
Users connect thought the database link to the remote database using the username and password referenced in a databases link. The username and password are set by the system administrator during link creation. As so, a local user can access remote database resources through a database link without even having been explicitly granted access to data on the remote database. By using the database link, it is possible for users to access remote database module, as they would be bounded by the privileges set in the database link.
The database link is a database complex object. However, its notable feature is that is contains embedded copy of remote module metabase to which it is attached. The remote module copy, to which the database link is attached, is required for type checking of remote queries. Such an approach for fetching remote module metabase is also used for distributed queries optimizations. The process of synchronizing remote module metabase is fully transparent and automated. In this way database users does not have to be even aware of remote metabase synchronization.
17.2.1 Creating and removing database link
The database link is registered and managed by a Link Manager. A new database link is created in the user current module. For this reasons each database link must have a unique name within current module.
The syntax for creating the database link is as follows:
· linkname - is a name of a database link that is added in the current module,
· username - is a name of user account at remote database,
· modulename - is a path to the linked module relative to the user main module,
· password - is a password for a given user account at remote database,
· hostname – is a host name/IP address of the ODRA remote host,
· port - ODRA database instance port number (default is 1521).
In case of creating, the database link with a name that already exists in the current module an exception would be thrown.
Below is an example of creating a database link named empLink, which provides access to remote module emps on host with 192.168.1.3 IP address:
The syntax for removing the database link is as follows:
· linkname - is a name of a database link to be removed.
Sample command for removing a database link with name empLink:
17.2.2 Executing remote queries
Executing remote queries with ODRA-based distributed databases is straightforward and flexible, while the database link is treated as an ordinary database object within SBQL syntax. The proposed architecture for database links allows for taking full advantage of SBQL semantics. This approach has benefit of simplicity and flexibility as querying remote databases does not introduce any special semantics. In this way, the database link is treated as complex object, which queried returns bag containing remote objects .The remote query that is to be executed on remote database must be followed with a database link name. The same rule applies for accessing remote views and procedures. It worth noticing that the database link is regular database object and it might be a part of more complex queries involving data from distributed and local databases within a single query.
For example, to retrieve remote objects of employees from remote database module admin.emps using a database link names empLink, one can issue:
Here is another sample query that returns salaries of a person who is Smith and is male:
The sample query presented bellow allows updating remote object data by seting new salary for ‘Smith’:
A database link mechanism provides capabilities for calling remote procedures. The example below makes a call to remote procedure getMaxSalaries stored on remote database. The remote procedure returns maximum salaries for employees:
A virtual repository constitutes a backbone for data intensive grid applications incorporating data and services distribution. The virtual repository integrates ODRA-based distributed databases into a single virtual database.
The virtual repository assumes autonomy of individual services treated as independent and self-contained units, with respect to their maintenance and business logic. Regarding this, local sites are fully autonomous, which means that it is not necessary to change them in order to make their content visible to global users of the virtual repository.
The core technologies that enable creation of the virtual repository are updatable views and database links. An updatable view provides full transparency of location for data and services by introducing virtual objects, which encapsulate the location and fragmentation of data. In this way, all contributory resources are integrated into a global virtual store by facilitating updatable view mechanism. Moreover, database links are responsible for data and services provisioning among connected nodes.
Another advantage of the virtual repository is its extensibility and scalability. Plugging a new node into the virtual repository does not affect other clients as so it does not imply any changes in business logic of currently available applications. Such capabilities are achieved by the updatable views and database links, which encapsulate the complexity of data distribution.
17.3.1 Integration of Distributed Resources
The virtual repository must have its schema designed. The schema is named integration schema, as it is responsible for integrating all distributed resources. The integration schema defines resources and services, which are available for clients.
It must be ensured that local servers expose their resources to virtual repository in a form, which conforms to an integration schema. This requirement is provided by contributory views, as their main task is to handle transformation of local servers schema into an integration schema form. It is local administrator responsibility for creating and maintaining contributory views.
First, the local administrator must design contributory schema, which must conform to the virtual repository schema. The next step involves creation of integration views, which provides data and services to global clients. The integration view transforms and glues together resources exposed by all contributory views. This view is also responsible for transformation and integration of data. This approach allows for data to be fragmented horizontally, vertically or even in mixed mode. In this way, clients are available to make queries and updates through an integration view in a transparent manner. The integration view solves many issues as clients are not obliged to have any knowledge about data location, fragmentation, replications, caching, etc. on the other hand, the integration view designer must have sufficient information regarding such issues as data location, replications, caching, etc. The integration view is provided with access to contributory resources through database links mechanism.
17.3.2 Integration Example
The example presented below is based on a virtual repository that keeps and manages information on employees and departments in the form of Employee (fname, lname, position) and Dept (name) objects. Objects are horizontally distributed among ODRA distributed database servers A and B. The sample distributed databases horizontally fragmented data and virtual repository are depicted in Fig. 17-3. The virtual repository provides resources, which are the union of two ODRA-based distributed databases. In this example, virtual repository and local server schemas are similar. Thus, it is not required to create any additional contributory views at local servers.
To integrate distributed resources into a virtual repository, according to the horizontal fragmentation, one must use the union operator. The proposed solution is to create an updatable view, which would allow combining objects from remote ODRA databases exposed by contributory views.
17-3. Integration of horizontally fragmented data from servers A and B
Before creating the integration view, it is required to create necessary database links referencing to distributed ODRA databases. However, database links would not be explicitly exposed by the integration view to clients.
The database links definitions referencing distributed ODRA databases are listed below:
The integration view listed below is responsible for merging object from distributed databases and transforming them to the required schema. The updatable views are created with respect to views syntax presented in chapter 10.
The updatable view DeptVRDef integrates Department objects from ODRA-based distributed databases. The integration view definition for DeptVRDef is as follows:
The updatable view EmployeeVRDef integrates Employee objects from ODRA-based distributed databases. The integration view definition EmployeeVRDef is as follows:
17.3.3 Usage Examples
Below are sample queries concerning presented earlier virtual repository schema and integration view.
Sample query for counting employees who are earning more than 3000:
Retrieve employee names with department names where they work:
Retrieve employees who earn more than 4500 with their department names:
Return all employees with first name ‘Tom’
Last modified: July 10, 2008