### Formal Specification

(read 9.3, 11, 10, remainder of ch. 9)

A formal specification is a precise, unambiguous specification of the functionality of a software system.

Examples:

• a program that implements the system (low level of abstraction)
• a mathematical description of the system using first order predicate logic

A formal specification is typically done at the ADT level. The behavior of each operation on the data type would be specified.

An outside observer sees only the interfaces of the operations, including a precise (but still abstract) description of the functionality provided by each interface. Thus, other system developers can use the operations provided by the ADT correctly without knowing anything about how the operations are implemented or about the structure of the data.

Given a formal specification of an ADT:

• implementors of the ADT know precisely how each operation should behave
• implementors of other parts of a system know precisely how to use each operation provided by the ADT

Formal specifications can be divided into two general categories:

• algebraic specifications, which do not require any model of the data in the system
• model-based specifications, which do require an explicit model of data

#### Algebraic Specifications

Algebraic specifications capture the behavior of an ADT in terms of sequences of operations on instances of the type. Operations are divided into two categories:

• constructors, which create or modify instances of the type
• observers, which are any other operations on instances of the type. Observers may retrieve some part of an instance, compute something based on an instance, and so on.

The behavior of constructors is not explicitly specified in an algebraic specification. The behavior of each observer is captured by specifying what it does to an application of each constructor.

ADT Stack[Elem]
/* generic stacks of some parameter type Elem */

interface:

/* constructors */
new: -> Stack
push: Elem * Stack -> Stack

/* observers */
pop: Stack -> Stack
top: Stack -> Elem
isEmpty: Stack -> Boolean
size: Stack -> int

axioms:

pop(new) == error
pop(push(e, s)) == s

top(new) == error
top(push(e, s)) == e

isEmpty(new) == true
isEmpty(push(e, s)) == false

size(new) == 0
size(push(e, s)) == 1 + size(s)


• no implementation bias
• produces elegant specifications for simple ADTs

• difficult to specify operations with side effects
• specification of error conditions and error handling is primitive
• notation seems unnatural for many developers

Algebraic specifications are very similar to functional programs. This example Standard ML program for binary search trees is easily translated into an algebraic specification.

The Larch family of formal specification languages consists of one shared algebraic specification language (the Larch Shared Language), and many interface languages for different programming languages. Examples: Larch/C++ and Larch/Ada. This approach allows specifiers to use algebraic specifications for parts of a system, and a more programmer friendly notation for other parts. A LSL specification for stacks is presented next. An interface specification using this Shared Language specification is presented in the section on Pre- and Post-State.

A Larch Shared Language specification of trait (type) Stack:

Stack(Elem): trait
includes cardinal, boolean
introduces
new: -> Stack
push: Elem, Stack -> Stack
pop: Stack -> Stack
top: Stack -> Elem
isEmpty: Stack -> Bool
size: Stack -> Card
asserts
Stack generated by new, push
Stack partitioned by pop, top, isEmpty
Stack partitioned by pop, top, size
forall e: Elem, s: Stack
pop(push(e, s)) == s
top(push(e, s)) == e
isEmpty(new) == true
isEmpty(push(e, s)) == false
size(new) == 0
size(push(e, s)) == 1 + size(s)


Features of LSL traits:

• the includes clause specifies what other traits are included (textually substituted) in this trait. Traits cardinal and boolean are standard traits defined in the LSL handbook.
• the introduces clause gives the types of all operations defined in this trait
• the asserts clause gives the axioms that hold for the operations defined in this trait
• each generated by clause gives one subset of the constructors defined in this trait that can generate all legal instances of the trait
• each partitioned by clause gives one subset of the observers that can be used to distinguish between any two instances of the trait
• the remainder of the asserts clause gives the axioms describing the behavior of the operations

Example: A Larch Shared Language specification of trait PriorityQueue. This specification uses a "specification constructor" build, which is not part of the interface of the trait (not visible to code that uses the specified trait). This constructor is used only to make writing the specification easier. When using such a specification constructor, the constructors that are visible to client code need to be defined in terms of the specification constructor.

This specification also uses an implies clause to express a condition that is always true about priority queues. This is closely related to the use of invariants in model-based specification languages.

An interface specification using this Shared Language specification is presented in the section on Pre- and Post-State.

PriorityQueue(Elem): trait
includes boolean, integer
introduces
new: -> PriorityQueue
enqueue: Elem, Int, PriorityQueue -> PriorityQueue
dequeue: PriorityQueue -> PriorityQueue
isEmpty: PriorityQueue -> Bool
asserts
PriorityQueue generated by new, enqueue
PriorityQueue partitioned by head, dequeue, isEmpty
forall e, e2: Elem, i, j: Int, q: PriorityQueue
enqueue(e, i, new) == build(e, i, new)
enqueue(e, i, build(e2, j, q)) ==
if i > j then build(e2, j, enqueue(e, i, q))
else build(e, i, build(e2, j, q))
dequeue(build(e, q)) ==
if q = new then new
else build(e, dequeue(q))
if q = new then e
isEmpty(new) == true
isEmpty(build(e, i, q)) == false
isValid(new) == true
isValid(build(e, i, new)) == true
isValid(build(e, i, build(e2, j, q))) ==
i <= j /\ isValid(build(e2, j, q))
implies
forall q: PriorityQueue isValid(q)


#### Model-Based Specifications

In model-based specifications, the data in the system is modeled using formal types such as:

• primitive types (integer, real, character)
• enumerated types
• tuples (like records)
• sets (usually finite and homogeneous)
• sequences (usually finite and homogeneous)
• strings (= sequence of character)
• functions, relations and mappings (as sets of ordered pairs)
• objects (in some languages)

All model-based specifications use pre- and post-conditions written over values of some (or all) of these model types. We'll study the SPECS-C++ notation for writing these conditions. SPECS-C++ is a model-based specification language designed for specifying the interfaces of C++ classes. Other examples of model-based specification languages: VDM and Z.

#### Tuples

declarations of tuple types:

type ordered_pair = tuple (int ord,
int abs);


Constructor: ()

Example: (1, 2) is an element of type ordered_pair

Observers: the name of each field of a tuple type can be used to extract that field from a value of the type

Examples (assume t = (1, 2)):

ord(t) = 1
abs(t) = 2


In SPECS-C++, C-style dot notation can also be used:

t.ord = 1
t.abs = 2


#### Sets

declarations of set types:

type int_set = set of int;


Constructors:

Standard set constructor: {}

Example: {1, 2, 3} is an element of type int_set

Set comprehension

syntax: { expression | predicate }

Example:

{x + 1 | x > 0 /\ x < 6 /\ x mod 2 = 1}

which is equivalent to:
{2, 4, 6}


Other constructors: \union, \intersect, -

Observers: \in (membership), \subset, | | (size or cardinality)

Example: |{2, 4, 6}| = 3

#### Sequences

declarations of sequence types:

type int_seq = sequence of int;


Constructors:

for building sequences: <_>: elem -> sequence of elem

Example: <1, 2, 3> is an element of type int_seq

appending sequences: _ || _: sequence of elem * sequence of elem -> sequence of elem

Example: <1, 2, 3> || <5> = <1, 2, 3, 5>

Observers:

let A = <a, b, c>

• _[_]: sequence of elem * int -> elem
A[2] = b

• head(_): sequence of elem -> elem
head(A) = A[1]

i.e. the first element of the argument sequence
• tail(_): sequence of elem -> sequence of elem
tail(A) = <b, c>

i.e. all of the argument sequence except the first element
• last(_): sequence of elem -> elem
last(A) = c

i.e. the last element of the argument sequence
• front(_): sequence of elem -> sequence of elem
front(A) = <a, b>

i.e. all of the argument sequence except the last element
• length(_): sequence of elem -> int
length(A) = 3

i.e. the length of the sequence

SPECS-C++ notation for logical operators:

• /\ - logical and
• \/ - logical or
• => - logical implication
• ! - logical negation
• \forall - universal quantification
• \exists - existential quantification

#### Pre- and Post-conditions

Operations (implementation functions and procedures) are specified using:

• a pre-condition, which specifies what must be true for the operation to execute correctly when it is called
• a post-condition, which specifies what is guaranteed to be true when the operation terminates (assuming the pre-condition was satisfied).

These conditions are written as first-order predicate logic assertions over the model types.

Example - a generic (non SPECS-C++) specification of class Stack:

ADT Stack[elem]
/* generic stacks of some parameter type elem */

type Stack = sequence of elem;

function new():Stack
/* an omitted pre-condition defaults to true */
post: new() = <>

function push(e:elem; s:Stack):Stack
post: push(s) = <e> || s

function pop(s: Stack):Stack
pre: s != <>
/* or length(s) > 0 */
post: pop(s) = tail(s)

function top(s: Stack):elem
pre: s != <>

function size(s: Stack):int
post: size(s) = length(s)

/* end of ADT Stack */


#### Pre- and Post-State

Model-based specifications can specify procedures (operations that modify their parameters). When the value of a parameter (say v) changes from the pre-state (the system state before the procedure is called) to the post-state (the system state after the procedure is called), the pre-state value is decorated with a ^ (i.e. v^) and the post-state value is decorated with a ' (i.e. v').

Example - A SPECS-C++ specification of class Stack:

template <class Elem> class Stack {
/* model
**   domains
**     sequence of Elem stacktype
**   data members
**     stacktype theStack
*/

public:

Stack();
/* modifies: self
** post: theStack' = <>
*/

void push(Elem e);
/* modifies: self
** post: theStack' = <e> || theStack^
*/

Elem pop();
/* pre: theStack^ != <>
** modifies: self
**    /\ theStack' = tail(theStack^)
*/

int size();
/* post: result = length(theStack^)
*/

int isEmpty();
/* post: (theStack^ = <> => result = 1)
**    /\ (theStack^ != <> => result = 0)
*/
}


Note that the parts of this specification that aren't comments are legal C++ syntax:

• this specification could serve as a header file
• a correct implementation of this specification is a C++ class with the same interface and the specified behavior.

When implementing this specification, there is no requirement that data members in the implementation be public, or that data members in the implementation correspond in any way to the data members in the specification.

Example - A (partial) SPECS-C++ specification of class PriorityQueue:

template <class Elem> class PriorityQueue {
/* model
**   domains
**     // representation of queue elements
**     //  p is priority, e is content
**     tuple (int p,
**            Elem e) qelemtype
**     // representation of priority queues
**     sequence of qelemtype qtype
**    data members
**      qtype theQ
**    invariant
**      \forall int i [1 <= i < length(theQueue^)
**        => theQueue^[i].p <= theQueue^[i + 1].p
**      ]
**    abstract functions
**      define uptoP(int pp, qtype q) as qtype such that
**           (q = <> => result = <>)
**        /\ (q != <> =>
**               result = <head(q)> || uptoP(pp, tail(q)))
**          /\ (head(q).p >= pp => result = <>))
**      define afterP(int pp, qtype q) as qtype such that
**           (q = <> => result = <>)
**        /\ (q != <> =>
**             (head(q).p < pp => result = afterP(pp, tail(q)))
**          /\ (head(q).p >= pp => result = q))
*/
public:

PriorityQueue();
/* modifies: self
** post: theQ' = <>
*/

void enqueue(Elem pe, int pp);
/* modifies: self
** post: theQ' = uptoP(pp, theQ^) || <(pp, pe)>
**               || afterP(pp, theQ^)
*/

Elem dequeue();
/* pre: theQ^ != <>
** modifies: self
** post: theQ' = front(theQ^) /\ result = last(theQ^)
*/

int isEmpty();
/* post: (theQ^ = <> => result = 1)
**    /\ (theQ^ != <> => result = 0)
*/
};


Example: A Larch/C++ specification of class Stack. This interface specification uses the model (data and operations) from LSL trait Stack.

template <class ElemType> class Stack {
uses Stack(ElemType for Elem);
public:

Stack() {
modifies self;
ensures self' = new;
}

void push(ElemType e) {
modifies self;
ensures self' = push(e, self^);
}

ElemType pop() {
requires self^ != new;
modifies self;
ensures result = top(self^)
/\ self' = pop(self^);
}

int size() {
ensures result = size(self^);
}

int isEmpty() {
ensures result = toInt(isEmpty(self^));
}
}


Note that this specification uses the data and functions defined in Stack, rather than data and functions from formal types built into the specification language. This is still a model-based specification. The difference is:

• in SPECS-C++, the model is predefined
• in Larch/C++ (and Larch/Ada, LCL, Larch/Smalltalk etc.), the specifier can construct arbitrary models in the Larch Shared Language

Example: a Larch/C++ specification for class priority queue. This specification uses the model defined in trait PriorityQueue.

template <class Elem> class PriorityQueue {
uses PriorityQueue;
public:

PriorityQueue() {
modifies self;
ensures self' = new;
}

void enqueue(Elem e, int p) {
modifies self;
ensures self' = enqueue(e, p, self^);
}

Elem dequeue() {
requires self^ != new;
modifies self;
/\ self' = dequeue(self^);
}

int isEmpty() {
ensures result = toInt(isEmpty(self^));
}
}


In either notation, an interface specification of a class or other ADT implementation must provide some means to test all pre-conditions. Client code does not have access to the representation of the type, and so can not check pre-conditions directly in general.

Exercise: check whether pre-conditions are testable in each model-based specification we've done.

More interesting specifications require quantified assertions.

#### Universal Quantification

An assertion of the form:

\forall type variable [ assertion ]

is only true if assertion is true for every variable value of type type.

Example - a function that tests whether or not a sequence is sorted:

function sorted(s: sequence of real):boolean
post: sorted(s) = \forall int i [ 1 <= i < length(s) =>
s[i] <= s[i + 1] ]


For values of i that are legal indices of s, the comparison must be made. For values of i that aren't legal indices of s, the implication is automatically true.

#### Existential Quantification

An assertion of the form:

\exists type variable [ assertion ]

is true if assertion is true for at least one variable value of type type.

Example - a function that returns the maximum value in a sequence:

function max(s: sequence of real):real
post: \exists int i [ 1 <= i <= length(s) /\
\forall int j [ 1 <= j <= length(s) =>
s[i] >= s[j] /\ max(s) = s[i] ]]


For values of i that are not legal indices of s, the existentially quantified assertion is automatically false.

#### Invariants

In some cases, not all values that can be represented by the type definitions are legal values of the ADT.

Example: a lookup table usually doesn't allow two entries to be associated with the same key value.

An invariant is a logical assertion that all legal values of the ADT must satisfy. The pre- and post-conditions of the operation specifications must ensure that all values produced using the operations satisfy the invariant.

template<class RangeType> class Table {
/* model
**   domains
**     int DomainType
**     tuple EntryType (
**             DomainType Index,
**             RangeType Value)
**   data members
**     set of EntryType theTable
**   invariant
**     \forall (EntryType t) [
**       t \in theTable^ =>
**         \forall (EntryType s) [
**           s \in theTable^ /\ t != s =>
**             t.Index != s.Index
**         ]
**     ]
*/
public:
Table();
/* modifies: self
** post: theTable' = {}
*/

** post: (\exists (EntryType e) [
**           e \in theTable^ /\ e.Index = I]
**   \/ (!(\exists (EntryType e) [
**           e \in theTable^ /\ e.Index = I])
**       /\ theTable' = theTable^ \union {(I, V)}
*/

void RemoveEntry(DomainType I);
/* modifies: self
** post: theTable' = theTable^ - {e | e \in theTable^
**                                   /\ Index(e) = I}
*/

void AccessTable(DomainType I, RangeType& V, bool& Defined);
/* modifies: V, Defined
** post: (\exists (EntryType e) [
**          e \in theTable^ /\ e.Index = I /\ V' = e.Value]
**          /\ Defined' = true)
**    \/ (!(\exists (EntryType e) [
**            e \in theTable^ /\ e.Index = I])
**        /\ Defined' = false)
*/

void ReplaceValue(DomainType I, RangeType V, bool& ChangedOK);
/* modifies: self, ChangedOK
** post: (\exists (EntryType e) [
**           e \in theTable^ /\ e.Index = I
**           /\ theTable' = (theTable^ - {e}) \union {(I, V)}]
**           /\ ChangedOk' = true)
**   \/ (!(\exists (EntryType e) [
**           e \in theTable^ /\ e.Index = I])
**       /\ ChangedOk' = false)
*/
}


Discussion:

• quantifiers were really necessary in this example
• the invariant is maintained by all operations
• need to check every operation that modifies self
• AddEntry is the only interesting argument
• this is a very concise and precise specification of a powerful ADT

Larch/C++ specifications typically do not use invariants, because invariant conditions are expressed using an implies or constrained by clause in the trait that the Larch/C++ specification is based on.

#### Evaluating Formal Specifications

Cons:

• notation can be difficult
• difficult to convince management, clients to adopt techniques that don't lead directly to code (payoff isn't immediately obvious)
• clients, managers don't understand formal specifications
• difficult to specify distributed/parallel, real-time and interactive systems
• formal specifications can be expensive

Pros:

• developing the formal specification provides a great deal of insight into the system
• formal specifications can form the basis of proofs of:
• program correctness
• timing behavior
• consistency and completeness of ADTs
• consistency: applications of ADT operations never produce instances of the ADT that violate the invariant
• completeness: all possible instances of the ADT can be produced using only the specified ADT operations
• executable formal specifications can serve as prototypes and test oracles
• formal specifications provide an unambiguous standard of correctness for implementations
• formal specifications serve as a guide for developing test cases
• formal specifications are the perfect documentation for ADTs:
• precise description of functionality (and interface)
• no exposure of data representation or algorithms

The future of formal specifications:

• likely to see more use for:
• safety, reliability and security critical systems
• standards definition
• programming language definitions
• data definitions (such as the IEEE standard for floating point numbers)