Esprit Proposal No 20288 Acronym: CRIT-2
Proposal Title Cooperative Research in Information Technology
Partner No. 5 Institute of Computer Science of the Polish Academy of Sciences -ICS
Country Poland
Workpackage # 3
Workpackage Acronym : ICS-FAI
Title: Elaboration of Standard Internet Tools for Integration of Databases, Knowledge Bases and Reasoning Systems
Coordinator Maciej Michalewicz

Proposal of a BayEsian Network InterOperability Format Standard for Knowledge Interchange (BENIOFSKI)

(Based on Proposal for a Bayesian Network Interchange Format of Decision Theory Group of Microsoft Research)

(Version 0.0 : Last Updated 15.5.1998)


Introduction

This page describes a proposed standardized belief network interchangefile format. We have developed this format as mechanism for:


Overview

The BN mini-language of the proposed interchange format has borrowed alot of elements from the C/C++ language. Not surprisingly, the formatsupports the single-line //... and multi-line /*...*/ commentsequences, and you will notice that we make liberal use of curlybraces. In addition, the definitions of identifiers, strings, andreals follow closely those from C/C++. Instead of presenting the format by specifying its precise syntax,I will rather illustrate the format by means of examples. Currently we elaborated in Polish a syntax in a BNF-like format.

Each belief network file (bnfile) carries the extensionDSC for identification purposes.

Major elements or blocks in a belief network fileare marked using curly braces. The general format of a block is:

       
 blockType BlockName        
{           [language]  attribute-name = attribute-value;
// or, alternatively,           [language] attribute-name is
attribute-value;        
}

If the optional language indicator is not specified then the definition applies to all languages. The list of permissible languages is given in the network definition block.

The blocks in a DSC file occur in the following order:

The network declaration block reveals the type of network andinformation that pertains to the network as a whole. Currently,only a single "Belief Network" type is defined, butit is expected that in the future other formats may be introduced(like, for instance, "Diagnostic Network", "DecisionDiagram", etc.).

Node declaration blocks describe the events of the network andtheir possible states, along with other attributes.

Probability definition blocks define the conditioning arcs betweennodes and contain the prior probabilities.


Properties

This proposed Belief Network format is designed to be flexible.In order to allow different groups to extend the standard to supportspecial technologies and tools, bnfiles can define andutilize additional attributes particular to the individual networkfile. These dynamic attributes are known as properties,and are meant to model atom property lists available in the LISPlanguage.

A property type is a declaration of the existenceof a property and applies to the network as a whole. A propertyis a particular instance of a property type, and is specificallyassociated with either the network itself or a particular node.

Properties are named according to the rules of bnfile symbolicnames. (These rules are identical to those of the 'C' language.)There are five possible property types:

In addition, each property can have a descriptive comment string.

For example, there could be a property type of CarsOwned,declared to be an array of strings. Any node could thenbe given the CarsOwned property, with values suchas ["87 Honda", "93 Pathfinder"] or ["83Taurus"].

No particular interpretation is given to properties; they maycontain any information desired. However, since the usage ofproperty names could conflict between organizations, it is recommendedthat implementing groups prefix the properties with a unique identifier.At FAI ICS PAS, we will prefix our property types with the characters"FAI_".

Here is an example properties block:

        properties        {            type description = string; 
           type helpContextIds is array of real, "help ids"; 
           type stateDescriptions is array of string, "one for each state"; 
           type FAI_label is choice of [other, hypothesis, informational,                                        problem, fixable, unfixable];           
 property description = "this is a test";            helpContextIds = [344,43434,877,46875];        }

The lines beginning with the keyword type definetypes of properties; the name of the property immediately follows.Each property type must have a data type declaration. Each mayhave an optional comment describing the purpose or usage of theattribute. Property types have file scope; that is, they may beused anywhere that properties are allowed.

The properties block is also used to define propertieswhich exist at the network level. The line above beginning withthe word property defines the value for the network-levelproperty called description. The keyword propertyis optional, as demonstrated by the subsequent line, whichdefines the value for the network-level property helpContextIds.

This properties block above declares four propertytypes:

The line beginning with the keyword property definesan instance of a property type. In this case, it is an instanceof the description property defined a few linesabove it. A property type must be declared before an instanceof it can be defined..

The last property type defined, FAI_label, is anexample of an enumeration. An enumeration propertyis simply a set of words which map one-to-one onto the integers,starting with one. In the example given, use of the word otherwould give the value one; use of the word fixablewould give the value 4. Enumerated properties are intendedto provide a high degree of readability in dynamically declaredproperties. Instances of enumerated properties may be definedusing either a word from the set or an integer within the allowedrange.

Note that reserved words may not be used as choices in an enumeration.


Network Declaration

The first element in a DSC or bnfile is the declaration of thename of the network. For example:

        network "Car Starting Example"        {            format is "BNFormat";
            language is  pol, eng, deu  ;        }

This declaration may be followed by the optional properties block.Then the actual belief network appears, defined by node blocksand probability blocks.

Node blocks declare the node variables in the network one at atime. Since node blocks do not interreference, they may occurin any order. Node blocks look like this:

        node NodeName        {     //  node attributes, described later        }

Here, 'node' is a keyword and 'NodeName' is an identifier (symbolic name). Probability blocks have the same general shape: Probabilityblocks specify the (conditional) probability tables (CPTs) forthese variables, and hence the topology of the network.

        probability (Node | D1, D2, D3)        {            //  conditional probability tables        }

Again, 'probability' is a keyword and the others are identifiers.As the notation suggests, this defines that node Nodehas a total of 3 parents, namely: D1, D2,D3.

Node and probability blocks can be freely mixed, except for therestriction that all the node blocks of variables referenced ina particular probability block have to precede the probabilityblock.


Node Blocks

Node blocks contain information about the node such as:

The node's type attribute is the only one that is required.In general, an attribute is of the form:

        attribute-type is attribute-value;  //  or        attribute-type = attribute-value;

Currently, standard attributes are: 'name', 'type', and, 'position'..Here is an example of a node block from the automobile beliefnetwork (BN) that David Heckermanand Jack Breese presentedsome time ago, and that is also featured in one of the articlesin the March, 1995 issue of Communications of the ACM.

        node Alternator        
{           eng name is "Alternator Output Voltage";
           pol name is "Napie`cie na wyjs`ciu alternatora";
           type = discrete[2] choice of [good, bad  ]; 
          pol choice name is [ "dobre"  "zl`e"];
            position = (15625, 10195);
            property FAI_label is "fixable";
            FAI_category = "Service Fixable";
            FAI_cost_observe = 15.00;
            FAI_cost_fix = 200.00;        }

This just tells us that we have a node with symbolic name Alternator.In this case, the full string description of the node/variableis "Alternator Output Voltage" (this is the string thatis usually displayed in a BN editor). It also says that the nodeis a discrete node with two states, labeled goodand bad, respectively. The position attribute isfor the benefit of the BN editor, and represents a point in graphicaldisplay space. The other attributes are specific to a particular(research) group and their precise meaning does not concern ushere. Note that the optional keyword property mayintroduce a property.

The minimum information needed to declare a node is its type.For instance, one can declare a ternary variable by

        node ThreeState        {            type: discrete[3];        }

whereas a continuous node would look like

        node Temperature        {            type: continuous;        }

Probability Blocks

Probability blocks are used t define the actual network topologyand conditional probability tables (CPTs). There are twokinds of probability blocks:

  1. blocks for standard nodes; that is, nodes forwhich we have to define the probabilities for each discreteparent instantiation (DPI) and
  2. blocks for causally independent (CI) nodes,for which we have to define the probabilities for only a limitednumber of DPIs.

CPT for Standard Nodes

An example of a standard probability block from the automobileexample, is the one for the gas gauge. We assume that the gasgauge has two states "not empty"/"empty" andtwo parents: 1) "Gas" (is there gas in the tank?) and2) "BatteryPower" (is the battery power sufficient?).The variable "Gas" has two states: "yes" and"no", and the variable "BatteryPower" hasthree states: "good", "low" and "none".

        probability(GasGauge | Gas, BatteryPower)        
{           (1, 1): 0.999, 0.001;
           (yes, low): 0.850, 0.150;
           (1, 3): 0.000, 1.000;
           (2, 1): 0.000, 1.000;
           (2, 2): 0.000, 1.000;
           (2, 3): 0.000, 1.000;        
}

The numbers in parentheses above cycle through all the DPIs (wenumber the states of a node consecutively and start at one, whichin general represents the "normal" or "good"state). The value zero is reserved for "unknown" values in databases. The order of the numbers in parentheses corresponds tothe order in which the discrete parents are listed after the barin the opening line of the probability block.

Although the probability vectors are listed in consecutive DPIorder here, this is not necessary in general. One is free to listthe vectors in any order, since the integers in parentheses uniquelyidentify the parent instantiation. In addition to giving the vectorsfor each individual DPI, we support the concept of a default entry.So the above CPT could have been specified equivalently as:

        probability(GasGauge | Gas, BatteryPower)        
{            default: 0.000, 1.000;
            (1, 2):  0.850, 0.150;
            (1, 1):  0.999, 0.001;
        }

Note: definitions for continuous probability distributionswill be forthcoming at a later time.

CPT for Causally Independent Nodes

Causally Independent (CI) nodes are characterized by the propertythat the probability vectors for each DPI can be derived fromthe probability vectors of the leak parent instantiation, andthe parent instantiations in which one and only one parent assumesa value different from its leak value. Conceptually, the leakparent instantiation represents the situation in which none ofthe parents is causing the child node to be in a abnormal state,and hence the probability vector associated with the leak instantiationmodels influences on the child node that are not explicitly accountedfor the parents. Currently, we support two functional CI relationships,namely: 'max' (noisy or) and 'plus'. (For a more detailed descriptionof CI nodes see the paper by Heckerman and Breese in the UAI proceedingsof 1994.) Expressed in terms of a CI CPT the following probabilityspecification for GasGauge is almost identical to the one givenearlier:

        probability(GasGauge | Gas, BatteryPower)        
{            function: type = max;
            (1, 1): 0.999, 0.001;  // leak term
            (1, 2): 0.850, 0.150;            //     this implies Pr(GasGauge | 1, 2) = (p, 1 - p),
            //     with p = 0.85 * 0.999
            (1, 3): 0.000, 1.000;
            (2, 1): 0.000, 1.000;        }

Typically, the state selectors of a leak instantiation are allones, because the 1st state usually represents the normal orgood or absent state.


Proprietary or Organization-Specific Extensions

One reason for choosing a general block format for belief networkfiles is that such blocks can be recognized in their entiretyby simple syntax. This allows a parser to ignore blocks whosecontent is unfamiliar. Thus, any organization can create entirelynew blocks which contain deeply nested information specific totheir implementation. Files containing such blocks can still beparsed by correctly written parsers. The unrecognized blocks arediscarded in their entirety.

The grammatical rules are as follows. If the parser finds an identifier(a valid unknown name) at the outer block level within a file,it will begin to consume tokens until the block has been closed.The general format for a block is:

identifier [(optional parenthetical argument list)]
{ [optional string of tokens]
}

Items in square brackets above are optional. Nesting can occurwithin both the argument list and the main block. Two nestingsymbols are allowed: curly braces and parentheses.The parser allows them to be nested within themselves or withineach other, but will report an error if a nested block is openedby one nesting symbol but closed by another.


Keywords

For reference purposes, here is a complete list of the keywordsof the current proposed format:

array
choice
continuous
creator
default
discrete
format
function
is
leak
name
network
node
of
parent
position
probability
properties
property
real
state
stringtype
version

[BACK] [BEGINNING]