Procedures and Methods in SBQL
(under
construction)
Back to Description
of SBA and SBQL.
Procedures are the most
important programming constructs for ensuring abstraction, encapsulation of
code and program reuse. Practically all programming languages introduce
procedures as a fundamental feature. Procedures encapsulate any complex
calculations and hide their code from the programmer. They can be called from
many places of applications and (in remote procedure call technologies) can be
called from outside of the address space that a procedure is located in. Procedures
can be parameterized explicitly, by parameters specified within their
declarations, or implicitly, by an external environment that the body of a
procedure can access and update (so-called side
effects).
Depending on their properties
procedures can be subdivided according to the following criteria (which can be
orthogonally combined):
·
Proper procedures
and functional procedures (functions). Proper procedures belong to the category
of program statements. They do not return a value, thus cannot be called in
expressions or queries. Functional procedures return a value, thus can be
called within expressions and queries. In majority of solutions functional
procedures can be used as proper procedures, i.e. their calls can be considered
statements. In this case the return value of a procedure is ignored. In some
languages (e.g. C/C++) all procedures are functional ones, but concerning
conceptual issues and strong typing this assumption is controversial.
·
Procedures and
methods. There are opinions that methods constitute “behaviour” of objects and
they are activated by message passing. There are suggestions that message
passing reminds communication in societies, thus presents a totally new
paradigm of programming that contributes to parallel execution of programs.
Such opinions are based on superficial observations and misleading
associations. Parallel programming is a very important issue, but methods and
message passing introduce to it exactly nothing. We follow the assumption that
methods are ordinary procedures with a specifically constructed program
environment (we explain it later) and message passing is equivalent to a
procedure call (within special context that will be explained too). The most
obvious difference between procedures and methods is their conceptual location.
Procedures are located within program modules (sometimes within program blocks,
e.g. Pascal), while methods are located within classes.
·
Procedures located
within an application code and procedures stored in a database. In traditional programming
languages procedures are properties of an application code. Relational
databases introduced a new kind of procedures, called stored procedures or database
procedures, that can be stored within a database. This issue is directly
related to the phase of binding procedure names. Majority of programming
languages assume early binding, i.e. binding during compilation and linking,
before the program is executed. Stored procedures require late binding, i.e.
binding during run time. The major advantages of early binding is much better
performance and better opportunities for strong type checking. The advantage of
late binding is flexibility and more possibility of generic programming, in
particular, programming with linguistic reflection[1].
·
Ordinary procedures
and higher-order procedures. Higher-order procedures can be parameterized by some
programming entities that are have properties of the source code, for instance,
by types, by procedures, by classes, by lambda expressions, etc. Higher order
procedures much increase flexibility and power, especially related to generic
programming. In contrast to linguistic reflection, which actually disables
compile-time strong type checking, higher-order procedures can be strongly
typed as well. There are various forms of type polymorphism proposed in
different languages. This kind of procedures much fascinated academicians and
is practically ignored by industrial professionals. There is old and perhaps
already obsolete hope that higher-order procedures will create a new quality in
program development. Despite many attempts (c.f. languages such as Scheme, ML,
Haskell, F#) till now it is unclear if indeed they present a new quality that
can be sufficiently attractive for the common programmer. Recently this kind of
paradigm has been implemented within the LINQ project by Microsoft Research
(lambda expressions), but after some experience our impression is that real
advantage of this style of programming is difficult to identify. The obvious
disadvantage concerns query optimization, which is still much underdeveloped in
comparison to SQL and SBQL.
SBQL presents a new quality
w.r.t. procedures and methods because we unify expressions and queries. Hence
queries can be parameters of procedures or methods, can be used within return
statements of functional procedures/methods and can be used within imperative
statements in bodies of procedures/ methods. Although considering the theory
and strong typing this is rather a well-recognized issue, none of the existing
programming languages that deal with queries (e.g. PL/SQL of Oracle, T-SQL of
Sybase and Microsoft, ABAP of SAP, SQL-2008 standard, C#/LINQ) does not assume
such radical treatment of queries in all contexts. We show on examples that
this SBQL feature indeed presents the new quality in application program
development.
Semantics of procedures is based on the
environment stack mechanism that we have introduced before for non-algebraic
operators. The environment stack (known also as call stack) was introduced in early 1960-ties for the programming
language Algol-60. Thus the history of this concept is almost as old as the
idea of high-level programming languages. Our original contribution is to use
this stack (ENVS) to define non-algebraic query operators, such as selection,
projection, navigation, join, quantifiers, etc., see the section SBQL - Non-Algebraic Operators.
The semantics of queries based on ENVS is a sound substitution of very limited,
inconsistent, and usually mathematically incorrect semantics based on
relational algebra, relational calculi, functional approaches, object algebras,
object calculi, monoid comprehensions, F-logic and other concepts that were
invented to deal with queries. A serious critics of object algebras can be
found in [SuLe95], but majority of this critics is relevant to other formal approaches
the we have listed above. The stack-based approach that we describe on these
pages presents a specific conceptual construction that is motivated by the
semantics of query languages. The thorough description can be found in the
section Environment Stack, Name
Scoping and Binding.
Usually developers of programming languages
assume that procedures or methods can access and update external entities,
including application data, databases, files, external devices, etc. directly,
without use of parameters. This is known as side
effects of procedures. Side effects can be passive, that is, a procedure can access external entities, but
cannot updated them, and active, when
a procedure can updated external entities. Currently popular programming
languages have no means to specify side effects of procedures. Originally, in
the concept of module that was
proposed by David Parnas, side effects were specified in the form of a
so-called import list. This was accomplished in the programming languages
Modula-2 and Eiffel. Import list were the components of types and access to
external entities was strongly typechecked. There is a similar concept of
import in languages such as Java, however, not as a component of a procedure
type. Interfaces in Java that specify signatures of procedures do not contain
information on side effects. This is sometimes criticized as a feature leading
to errors. Methodologies such as Design by Contract provide specification of
side effects of procedures (an a lot of other features such as preconditions
and postconditions).
Syntactically, we distinguish declarations and calls (invocations) of a
procedure/method. A declaration of a procedure consists of procedure name, formal
parameters (single names) usually associated with their types, the type of its output (for functional procedures) and a
procedure body where the programmer
specifies a sequence of instruction that is to be performed. Within the body
there are declarations of local objects
or variables or sometimes other declarations (e.g. local procedures). A call of
a procedure consists of procedure name and procedure actual parameters which
are expressions. Types of the expressions must conform to the declared types of
corresponding formal parameters. A procedure is encapsulated: the programmer
that uses a procedure needs to know only its signature which consists of procedure name, formal parameters with
their types and the type of a procedure output. The signature contains also
some other information, for instance, concerning the method of parameter
passing. For instance, there is a declaration of a procedure square:
square( a: real): real {
counter: integer; x: integer;
counter := 0; x := 1;
while counter < 100 do {
x := (a/x + x)/2;
counter := counter +1;
}
return x;
}
The signature of the procedure is
square( a: real):
real
An example call of the procedure
square( 5+ z*t ) / 120
When a procedure or a method is called, a new
section (so-called activation record)
on the environment stack is pushed. The section contains three kinds of
entities:
·
Local
procedure environment, i.e. all local objects/variables that are declared in
the body of the procedure/method.
·
Calculated
actual parameters of the call. In some languages (e.g. C) calculated parameters
are equivalent to variables, but this is not a rule.
·
Return
path, i.e. an address of the program code when the control should be passed
after the procedure is terminated.
Sometimes a call implies pushing on the environment stack more sections.
When a procedure or method is terminated, its activation record is popped. When
a procedure is running, local environments of a procedure that called it is
unavailable. This is due to the lexical scoping principle which assumes no bindings
to entities of any environment that the programmer was not aware during writing
a procedure. This is illustrated in Fig.64. Black sections denote entities
unavailable for bindings. Bindings are performed from top of the stack to its
bottom, as usually. In case of query languages the situation can be a bit more
complex, because sections implied by procedure/method calls can be mixed with
sections implied by non-algebraic operators. This will be presented later.

Fig.64. ENVS stack: procedure P1 calls
procedure P2, which calls procedure P3
Procedures can be called from procedures
without limitations, in particular, any recursive calls are allowed. Due to the
stack-based semantics recursive calls are not extraordinary and in majority of
languages recursive calls are not distinguished syntactically. However the
following cautions should be observed:
·
The
size of the environment stack. It limits the depth of recursive calls,
especially if the stack is organized in RAM.
·
Within
the procedure there should be a condition ensuring the end of recursion.
Otherwise the recursive calls must result in failure, but if the stack is
large, this may happen after long time. For this reason some languages limit
the depth of recursion.
·
Active
side effects should be avoided or limited, because a next recursive call can
overwrite the results of previous calls. Procedures that can be recursive
without inconsistencies are called reentrant.
In SBQL we do not assume any
limitations to the recursion. The programmer is responsible for avoiding stack
overflows and for ensuring the end of recursion. Examples of recursive
procedures in SBQL are presented in the chapter Recursive Queries in SBQL.
Parameters
of Procedures
Procedures and methods can be
parameterized. In contrast to mathematical functions, where a parameter is
always evaluated to a value, parameters of procedures/methods can posses
different semantics which influences their functionality and pragmatics of the
use. Below we list several popular parameter passing methods. They can coexist
within one language but usually must be syntactically distinguished.
·
Call-by-value. This method
assumes that the parameter is evaluated before the procedure code is started.
The parameter should result in a value (perhaps a complex one). If evaluation
of the parameter results in a reference of object/variable, the dereferencing
operation is performed. There are two a bit different treatments of the
parameter within the procedure body. In languages such as Pascal the parameter
is not a variable, thus it is impossible to assign to it a new value. In C/C++
inside the body of the procedure the parameter is treated in the same way as a
local variable. In some systems, e.g. CORBA, a call-by-value parameter in the
declaration of a procedure and in its signature is preceded by the keyword in.
·
Call-by-reference. As before, the
method assumes that the parameter is evaluated before the procedure code is
started, but the evaluation must result in a reference to an object (variable).
Within the procedure body the reference is used to update or delete the object.
Hence the method assume updates of an object from outside of the local
procedure environment. Such a parameter is preceded by a special keyword: e.g. var in Pascal. In other systems (in
particular built according to the CORBA standard) the parameter is preceded by
keyword out or inout. There is a small difference in treatment of these two
keywords: out means that the
parameter is used for output only, while inout
means that the parameter can be used for input
and for output. This difference can
be checked by the strong typing system, which should forbid the dereferencing
operation for out parameters.
·
Strict-call-by-value.
As
for call-by-value, but no dereferencing operation is performed. A reference is
treated in the same way as a value, hence the method can be applied without a
special syntax as call-by-value or call-by-reference. The method implies
less opportunities for strong type checking, but lack of a special syntax for
distinguishing kinds of parameter passing is an advantage. The method is used
in such languages as C. It has some meaning for query languages if the
programmer wants to use a parameter which is evaluated to a bag of structures
joining values and references.
·
Call-by-value-return. As for
call-by-reference, but a copy of a referenced object within local procedure
environment is created. All operations on the referenced object are then
performed on this local copy. At the end of the procedure, the value of the
copy is sent as the value of the original object. The method has a great
meaning in distributed environments, as it minimizes the number of connections
to remote objects. However, the method is not semantically clean, if two or
more call-by-value-return parameters refer to the same object (what may happen
due to nested loops, processing bags, etc.) In this case two or more copies of
the same object will be created, all copies can be updated, but only updates of
the copy that is sent as the last will be indeed recorded in the original
object. Other updates will be lost, which as a rule means an error in the
program.
·
Call-by-name. The method
assumes that the parameter is not evaluated before the procedure body is
started. Instead, an expression being the parameter is identified. A pointer to
the expression code is passed to the body of the procedure. When the control
meets this pointer within the procedure body, the expression is evaluated (in
the caller environment). The method has some advantage: it avoids evaluation of
the parameter if the control does not meet the above pointer. However, the
method may require evaluation of the parameter many times. Moreover, each
evaluation can return a different result (if the caller environment is
changing), thus the method requires attention from the programmer. The method
(implemented in Algol-60) is considered historical and obsolete, but the
come-back can be considered in the context of query languages. If the
environment necessary for evaluation of the parameter is not changed (e.g. it
is a database) then the method can be implemented in such a way that each
occurrence of the parameter within the procedure body is macro-substituted by
the expression communicated as a parameter. This would give more opportunities
for query optimization (rewriting, indices, etc.), because a query being the
parameter is joined with a query that uses the parameter. We illustrate this method
in example E.PR.1. The method is similar to the method of processing views
known as query modification
(explained later).
|
E.PR.1 |
C.f. Fig.63.
Procedure MyEmp returns references
to Emp objects communicated as the
first parameter having names communicated as a second parameter. Assume there
is an index for name and no index
for sal. |
|
SBQL: |
MyEmp ( who:
EmpType[0..*]; myNames: string[0..*] ): EmpType [0..*] { return
who where name in myNames; } |
|
|
Within
the query who where name in myNames no optimization is possible because the index
for name cannot be applied. |
|
|
Call the
procedure MyEmp for employees
earning more than 1000 and for names “Doe”, “Poe” and “Lee”. |
|
SBQL: |
print( MyEmp(
Emp where sal > 1000; bag{“Doe”, “Poe”, “Lee” } ) ); |
|
|
Note that
the call cannot be optimized too. However, when the call-by-name method is applied, the query who where name in myNames is
rewritten to: (Emp
where sal > 1000) where name in bag{“Doe”, “Poe”,
“Lee” } This
query can be rewritten to the semantically equivalent form: (Emp
where name in bag{“Doe”, “Poe”, “Lee” }) where sal > 1000 The query
can be optimized by using the index for name. |
·
Call-by-need. Sometimes the
method is called lazy evaluation. Its
assumptions are similar as for call-by-name,
but the parameter is evaluated only once, first time when the control meets the
pointer to an expression being the parameter. After evaluation, its result is
stored within the local procedure environment and then reused each time when
the control meets the pointer an expression. It has some performance advantage
over call-by-name, but in comparison
it gives less opportunities for query optimization that we have mentioned for call-by-name.
If we assume that parameters
can be queries, the situation concerning parameter passing methods is changed.
A query can return a complex value which consists of nested bags, sequences and
structures. Moreover, the value can contain values and references. As we can
see from example E.PR.1, some parameter passing methods can be much better to
optimize than the traditional methods. For this reason the decision which
method can be chosen for parameters being queries requires more research and
experience.
Syntax
and Semantics of SBQL Procedures
In SBQL there is little
distinction between procedures and functional procedures. Semantically, the
only difference is that a functional procedure name is eventually used to store
the output from the procedure. Thus, an invocation of a functional procedure
can be treated as a query and can be nested as a part of other queries. In
principle, a value of any type (including nested structures, bags, sequences,
etc.) can be allowed as an output from a functional procedure. Any
limitations in this respect violate the
orthogonality principle and are not justified by implementation (which is
equally easy for any type). A non-functional procedure cannot be a part of a
query because no value is assigned to its
name. Typing of procedures and functional procedures is a bit different: the
type of a proper (non-functional procedure does not contain a return type. All
procedures can call procedures without limitation; arbitrary recursive calls
are allowed with no syntactic distinction. A procedure is uniquely identified
by its name. Overloading of procedure names is not supported (but this of
course can be changed, e.g. in the spirit of Java). The result of a functional
procedure is typed similarly to other programming languages. Each functional
procedure can also be used as a proper procedure; in this case its return value
has no meaning (is dropped).
A procedure declaration consists of
a procedure name, a typed parameter list, a type of its output and a list of
imperative SBQL statements.
proc_declaration ::= name
([formal_par_list]) [: return_type] { [statements] }
formal_par_list ::= par_declaration
[; parameter_list]
statements ::= instruction
[ statements ]
instruction ::= …any instructions defined previously and
procedure calls…
instruction ::= return
; | return query ;
par_declaration ::= [par_transmission_method]
par_name : type [ cardinality ]
par_transmission_method ::= in | out | inout | strict | macro | …
Procedure parameter declaration
syntax determines its name, type and cardinality. For a given procedure names
of parameters must be unique. If the cardinality is not specified, the default
cardinality [1..1] is assumed. In the procedure signature parameter
declarations are separated by semicolons. Instruction return concerns proper procedures and causes immediate passing
control to the caller of a procedure. It need not necessary if execution of the
body of the procedure is terminated naturally. Instruction return query causes the same, but with storing the result of query
on top of the QRES stack. This instruction must occur at least once within a
functional procedure. In this case instruction return without query is
not allowed and ending the procedure without return query is not allowed too.
Parameter transmission methods are distinguished syntactically by
keywords. The default method (no keyword) is equivalent to in and denotes call-by-value.
Keywords out and inout denote two variants of call-by-reference.
Keyword strict denotes strict-call-by-value. Parameter macro denotes a variant of call-by-name in which the parameter is
treated as a macro-definition. Other parameter passing methods can also be
defined, depending on the research and experience[2].
Procedure invocation syntax is the
following:
proc_invocation ::= name
[([actual_par_list ])]
actual_par_list ::= parameter [; actual_par_list]
parameter ::= query
| par_name : query
Syntactically, a call of a
procedure consists of its name and parameters. If the list of parameters is
empty, it is allowed to drop parentheses (as in Pascal). Lack of parentheses
may have some meaning, in particular, it allows to abstract from the
distinction between procedure or method call and binding to stored
objects. For instance, if the class of
objects Person has the functional
method age, then the query Person where age > 25 looks more naturally than Person where age() > 25.
For better legibility and reducing possible errors each actual parameter can be
preceded by the corresponding name of a formal parameter. If all actual
parameters are preceded by names of formal parameters, then the order of actual
parameters is inessential. Additionally, if a mechanism of parameter default
values would be implemented (we explain it later) then some or all parameters
can be dropped. Otherwise, the order and the number of actual parameters must
correspond to the order and the number of formal parameters.
Inferred types of actual
parameters must conform to declared types of formal parameters, the conformity
is determined by the substitutability principle or by some implemented forms of
ad hoc conformity (e.g. the integer type can be used when the real type is expected). So far we do not determine precise rules of ad hoc conformity, as the typing system
of SBQL can differ in details (one of the variants is implemented in ODRA, but
this is not a standard).
Semantics of a procedure call
follows the semantics explained above. Parameter transmission methods call-by-value, call-by-reference and strict-call-by-value
require evaluation of the actual parameter before the control is passed to the
body of the procedure. The evaluation of a parameter results by a new section
on the query result stack QRES. Then, a new section (activation record) is
pushed on the environment stack; the section consists of actual parameters
(represented as binders), local procedure objects (represented as binders too;
the corresponding objects are stored in a special part of a volatile store) and
return path (represented as an address of a compiled bytecode). Then, values of
the parameters are popped from QRES. After this section is created, the control
is passed to the body of the procedure, i.e. sequence of instructions
(statements). Instructions from the body are executed; the order is determined
by their sequence and by imperative control statements explained in the chapter
Imperative Constructs of SBQL. When the
control meets the end of the sequence of instruction or a return statement, the procedure is terminated. Termination means
the following actions:
·
For functional procedures and methods the result of
the query within the return statement
is pushed at the top of QRES.
·
The return path is stored within some auxiliary
variable.
·
All local objects that were created for this procedure
are removed.
·
The environment stack ENVS is popped – the section that
was pushed at the stack when the procedure was started is removed.
·
The control is passed to the caller code according to
the return path.
Last modified: March 8, 2010
[1] Linguistic reflection is a property of a
programming environment that makes it possible to dynamically generate a
program that can be executed as a part of the generating program. The
best-known language with linguistic reflection is LISP.
[2] In ODRA only strict-call-by-value is implemented, hence no keyword precedes a
parameter declaration.