SBQL order by Operator
and Range Queries
Back to Description
of SBA and SBQL.
A sorting operator order by is introduced in SQL and is proposed in other
query languages, in particular in ODMG OQL. The operator makes it possible to
determine sorting of a query result according to some chosen keys. Sorting is a
very important operation in query languages for the following reasons:
In the relational model sorting was considered
auxiliary, implementation feature, which can be used only for forming final
output from a query. The sorting cannot be covered by the relational model.
Relations are sets and the order of tuples in a
relation is inessential. Hence the
sorting operator and operations relying on sorted collections are not
expressible in the relational model theory. They are also not expressible in
other theories developed for the relational model, such as relational calculus
and formal logic. Fortunately, majority of implementations of SQL do not
consider theories as a serious obstacle in developing programmer’s
interfaces. Some SQL implementations provide also range queries allowing to
retrieve i-th
element from a sorted relational table (or to retrieve elements from i-th to j-th).
In object-oriented databases sorting operator
is provided as well. In ODMG OQL the sorting operator is introduced exactly as
in SQL, including range queries. Object-oriented databases usually introduce
sequences as a kind of stored collections, hence sorting operator and range
queries become necessary. Similarly, XML assumes the order of its parts, hence languages such as XQuery
can be used to formulate range queries.
However, some aspects of the sorting operator
and range queries are not considered systematically in the literature and in
proposed languages. Some range queries are still impossible to formulate. For
instance, we a query “Get employees in the alphabetic order according to
last names, together with the rank number of their salaries (the lowest salary
obtains the rank number 1, the highest salary obtains the rank number count(Employees) )”. Such a query cannot
be formulated in SQL, OQL and XQuery. The reason is
that in these languages a rank number can be used in queries as input, but not
as output. SBQL is the first language that allows for both cases.
In this
chapter we present thoroughly major issues related to the sorting operator and
range queries in the object query language SBQL.
In SBQL the sorting operator belongs to
non-algebraic operators. We assume the following syntax:
query ::= query
order by query
Semantics of the query q1
order by q2 is as
follows. As usually, a collection being sorted is determined by q1 and a sorting key by q2.
The precise semantics is explained in the
following steps:
1. First q1 join q2 is evaluated (see the
semantics of the non-algebraic
join operator). Let q1
returns bag{r1, r2, …}
or sequence{r1, r2, …}, a sequence is converted to bag.
In the result of the join we obtain a bag of structures
bag{ struct{ r1, v11,
v21, ..., vk1 }, struct{ r2, v12,
v22, ..., vk2 }, ....
}
where
struct{ v1i, v2i, ..., vki }
is a structure returned by q2 for ri.
2. If some vji is
a reference, a dereference operator is performed which changes the reference to
a value. If a dereference concerns a reference to a complex object then it is
treated as a failure. This situation can be considered a typing error
discovered during static type checking. If the dereference is impossible or the
value after the dereference does not belong to the
type with a natural linear ordering, then such a situation is considered a
failure too. For instance, this is the case of multimedia data types. This situation
can also be considered as typing error discovered during static analysis of a
query.
3. The bag resulted
from q1 order
by q2 is sorted
according to v1i, then for
identical v1i is sorted
according v2i, etc. till
performing the sorting for the last sorting key vki.
4. After sorting we
obtain a sequence
sequence{ struct{ s1, vs11,
vs21, ..., vsk1 }, struct{ s2, vs12,
vs22, ..., vsk2 }, .... }
which differs from the
previous bag only by the order of structures.
1. The final result of
q1
order by q2
sequence{ s1 , s2
, ....
}
is
obtained from the previous sequence by removing results returned by q2 (equivalently, projection
of the sequence to the results returned by q1).
Examples (see Fig.35
in the Chapter SBQL for the AS1
model)
|
E.RAN.1 |
Get references to employees sorted according
to names |
|
SBQL: |
Emp
order by name |
Note that the query is formulated with the use of inheritance.
|
E.RAN.2 |
Get references to employees sorted according
to their ages and then according to names |
|
SBQL: |
Emp
order by (age,
name) |
Note the use of the inherited method.
|
E.RAN.3 |
Get departments sorted according to the
number of employees and then according to boss names; return names of
departments, and their locations sorted alphabetically. |
|
SBQL: |
(Dept order by (count(employs), (boss.Emp.name))). (dname, ((loc as x) order by x).x)
group as locations) |
We use an auxiliary name x as an ordering key. Then, to remove x from
the result we make projection on x. All locations of a department are grouped under a single name locations. Note that in the result bag we
return references rather than values.
|
E.RAN.4 |
As above, but return only departments with
the budget greater than 1 mln. The output should
contain the number of employees and name of a boss. |
|
SBQL: |
(((Dept where budget >
1000000) join count(employs) as c join (boss.Emp.name) as b) order by (c, b)). (dname, c, b, ((loc as x) order by x).x) group as locations) |
We assume that all algebraic and non-algebraic operators preserve the
order of elements in a processed sequence. This concerns such operators as where, dot, as, for each, etc. For
some operators, e.g. for join, this is not obvious. In general, for each SBQL
operator it should be determined how it behaves for combination of kinds of
collections being its arguments.
Sometimes sorting of strings require regarding/disregarding small and
capital letters. This effect can be achieved through a
typical operations on strings.
|
E.RAN.5 |
Get references to employees sorted according
to names; disregard lower and upper letter cases |
|
SBQL: |
Emp
order by toUpper(name) |
The presented semantics of the order
by operator has several peculiarities which should be well understood. The first
of them concerns the situation when the sorting attribute is optional, for
instance sal
in Fig.35. Consider the
query
|
E.RAN.6 |
Get references to employees sorted according to
salaries |
|
SBQL: |
Emp
order by sal |
According to the semantics of the join
operator on which the order by
operator is based on, the query omits all employees for which the subobject sal does not exist. Hence the sorted result will contain only
references to Emp objects having sal attribute. If the programmer
wants to take into account all Emp
objects, he/she can use, in particular, one of the methods that are presented
in the chapter Storing and Processing
Irregular Data. For instance, assuming that for employees having no salary
the sorting key is 0, the above query can be formulated as follows:
|
E.RAN.7 |
Get references to all employees sorted
according to salaries; assume sal = 0 for employees having no salary |
|
SBQL: |
Emp
order by max(bag(0, sal)) |
|
SBQL: |
Emp
order by if exists(sal) then sal else 0 |
A similar situation arises with multi-valued keys. Consider the query
|
E.RAN.8 |
Get references to employees sorted according
to jobs |
|
SBQL: |
Emp
order by job |
Attribute job is multi-valued, hence according to the join operator each
reference to an employee object having n jobs, n >1, will be repeated n
times in the final result. Such interpretation of the sorting operator seems to
be the most logical and reasonable, however, some programmers could be
surprised with such a result, hence they should be
aware of the semantics. Of course, there are many ways to avoid repetitions of
references. For instance, the programmer can write a query in such a way that
for sorting only the first job is considered (providing jobs are typed as
sequences):
|
E.RAN.9 |
Get references to employees sorted according to
jobs, return employee name and jobs (no repetition of Emp references). |
|
SBQL: |
(Emp order by job[1]).(name, job group as jobs) |
If jobs are not typed as sequences, depending on the type constraint job[1] will return
a type error or a random job. It is also possible to construct a query that
takes some additional sorting logic. Assume there is a collection of objects JobRange(j: string, r: integer) that assign
to jobs range number (1 is the range of the most important job, etc.). Now we
can ask the following query:
|
E.RAN.10 |
Sort employees according to their most
important jobs; return employee name and jobs sorted according to their range
(no repetition of Emp
references). |
|
SBQL: |
(Emp order by min((JobRange where j in job).r). (name,
((job as k order by ((JobRange where j = k).r).k) group as jobs |
The examples show that there is a lot of
semantic peculiarities with the sorting operator. However, the power of SBQL gives
the hope that almost all of them can be efficiently formulated in the query
language.
Sorting in Ascending and Descending Order
In SQL and OQL there are special keywords asc
and desc placed after a sorting key; asc is optional. Mathematically, asc
is an identity function, while desc is a generic
function that returns the reverse of an argument (assuming the linear ordering
of the sorting key domain). The function desc for a
number X is simply –X. Alternatively, assuming that a sorting key should
be always a positive number, we can assume that desc(X)
= MaxPositiveNumber – X. For strings the
function desc can be defined as a conversion of the
string characters where each character with ASCII code X is substituted by a
character with ASCII code 256 –X. Similarly for other
coding standards. The function desc should be
defined for each domain that values can be sorting keys. It is a secondary
issue if such a function should be written in a typical syntax desc(X) or in the syntax assumed by SQL and OQL: X desc.
|
E.RAN.11 |
Get references to employees sorted according
to age, descending, and then according names, ascending. |
|
SBQL: |
(Emp order by (desc(age), asc(name)) |
Alphabetic Order in Native Languages
In early relational systems the SQL order
by operator caused a lot of controversy concerning sorting strings in
native languages different than English (especially in German and French). The
SQL developers assumed the ASCII coding for English characters, which exactly
corresponds to the alphabetic order of strings in English. However, the rules
of alphabetic order for other languages (German, French, Polish, etc.) are
different. Moreover, alphabets of these languages contain characters absent in
the English alphabet. Sometimes sorting rules have anomalies, for instance in
German double s is equivalent to the character β. Changing the rules of
ordering by SQL was totally unacceptable for many users (and office rules), who
wanted to follow several hundreds years of tradition concerning various kinds of
alphabetically ordered lists, reports, dictionaries, etc. In effect, vendors of
relational systems were forced to change the rules of sorting assumed by the
SQL order by operator. The solution
gives the decision in hands of the database administrator, who has proper
options to determine the order of strings for a particular application
according to the given native language. This method has solved majority of
sorting problems, but not all of them. For instance, one application can
require (in the same run) sorting according to English, according to German and
according to Polish. Only the programmer of a database application (not the
database administrator) is able to determine which kind of ordering is required
in a particular place of the program.
There is a simple solution of the problem, which consists in
implementing a family of functions with string arguments that return the
ordering key corresponding to a particular native language. Such a function
takes a parameter being a string in a native language and returns a string of
numbers. For each character in the input string the function returns a number
being its position in the alphabetic rank. For instance, for Polish the
alphabet the function mapping can be as follows: a→1, ±→2, b→3,
c→4, æ→5, d→6, e→7, ê→8, f→9, g→10, h→11, … This reminds the ASCII coding, but the system
can be different than ASCII due to some exceptional cases (e.g. equivalence of
–ss- and –β- in German). Because
such functions are an internal sorting feature, correspondence to any standard
is less essential.
An additional parameter of such a function can determine an encoding
standard of an input string, for instance, ISO, UTF-8, etc.
|
E.RAN.12 |
Get references to employees sorted according to
department names in English (ISO), and then, according to employee names in
Polish (UTF-8). |
|
SBQL: |
(Emp order by (English(ISO, worksIn.Dept.dname), Polish(UTF-8, name)) |
In general, range queries are properties of
sequences (as collections) rather than directly properties of ordering query
operator. Because the order by
operator always returns a sequence, range queries are naturally associated with
this operator. In the simplest variant a range query means the possibility to choice
an i-th
element of a sequence, where i is a parameter that can be
determined by an expression (a query). The typical syntax is as follows:
query ::= query [ query
]
The query q1[q2] returns i-th element of q1, where i is determined by q2. In a more general
variant, q2 can return a
bag of numbers. In this case q1[q2]
returns all elements that are pointed by q2.
We can also assume cardinalities written in the form i..j or i..*, which return a bag of numbers
starting from i
and ending by j, or starting by i and ending by
infinity. This convention gives us a lot of possibility to write range queries,
for instance:
|
E.RAN.13 |
Get references to 50 best-paid employees. |
|
SBQL: |
(Emp order by desc(sal) )[1..50] |
|
E.RAN.14 |
Get the median salary of employees |
|
SBQL: |
((Emp order by sal )[(integer)(count(Emp)/2)]).sal |
Note that in the above examples we assume that
counting of sequence elements starts from 1. The languages C, C++, Java, C# and
standards CORBA and ODMG assumed that the counting starts from 0. This solution
was reasonable in assembler and C/C++ due to accordance of indices of arrays
with pointer arithmetic. However, currently developed languages are very far
from assemblers, hence such a convention is a kind of unnecessary atavism what
is illogical and error prone. In SBQL we have rejected it.
|
E.RAN.15 |
Get references to 50 best-paid employees and
50 worst-paid employees. |
|
SBQL: |
(Emp order by desc(sal) )[bag(1..50, (count(Emp) – 49)..count(Emp)))] |
If we would use the convention that ordering of
sequences starts form 0, the above query must be reformulated to a less legible
form:
|
E.RAN.16 |
Get references to 50 best-paid employees and
50 worst-paid employees. |
|
??? |
(Emp order by desc(sal))[bag(0..49,(count(Emp)-50)..(count(Emp)-1))] |
Queries written in this syntax are able to
return elements of a sequence if their order numbers are known, but are unable
to return order numbers for sequence elements that are selected in another way.
For instance the query “Which rank number takes Brown in the ranking
concerning salary?” is to be formulated as a program (a sequence of
instructions) rather than a single query. For this reason in the Loqis and ODRA system we have introduced a special operator
range as. The syntax is similar to
as and group as operators:
query ::= query
range as name
Semantics of the query q range as n is as follows. Let q returns sequence{r1, r2, r3, …}. Then, the
query q range as n returns
sequence{ struct{ r1, n(1), struct{ r2, n(2) , struct{ r3, n(3) }, ....
}
Each element ri returned by q is
associated with the binder n(i), where n is a name determined by the
programmer, i
is a rank number of this element in the sequence, starting from 1. This makes
it possible to formulate any conditions based on this rank number, as well as
returning it as an output of a query.
|
E.RAN.17 |
Which rank number takes Brown in the ranking
concerning salary? |
|
SBQL: |
(((Emp order by desc(sal)) range as s) where name =
“Brown” ).s |
|
E.RAN.18 |
Get a report sorted by departments
names and returning the names and the rank of the department concerning the
number of employees, descending. |
|
SBQL: |
(((Dept order by desc(count(employs))) range as c) order by dname)
“Brown” ).(dname,
c) |
|
E.RAN.19 |
Return employees having the rank in the
salary category (descending) at least on 10 higher from the rank in the age category
(descending): |
|
SBQL: |
((((Emp
order by -sal) as e range as i) order by -e.age)
range as j ) where i >= j+10).e |
|
E.RAN.20 |
Return the average salary of
employees, disregarding 25% of the worst paid and 25% of the best paid ones. |
|
SBQL: |
avg((((Emp orderby sal) range as r)
where r > (integer)(0.25
* count(Emp)) and
r < (integer)(0.75 * count(Emp))).sal) |
Last modified: August 6, 2009