Introduction

Why we use program verification: mathematics can be used to state program properties and to prove a program correct for all inputs.

Limitation: We have only limited ways of convincing ourselves that we have given the right specification: there can be mistakes in the specification and in capturing the informal requirements. In addition, all we can prove at best is the correctness of a mathematical abstraction, never of the system running in the real world.

Programming Verification Methods
Programming Verification Methods

declarative program VS imperative programming: a declarative program $P$ in a language based on a given logic is typically a logical theory in that logic. the properties that we want to verify are satisfied by $P$ can be stated in another theory $Q$.

Equational Theories

Theories in equational logic are called equational theories. In Computer Science they are sometimes referred to as algebraic specifications.

An equational theory is a pair $(\Sigma, E)$, where:

  • $\Sigma$ called the signature, describes the syntax of the theory, that is, what types of data and what operation symbols (function symbols) are involved;
  • $E$ is a set of equations between expressions (called terms) in the syntax of $\Sigma$

the $\Sigma$ can be more or less expressive, depending on how many types (called sorts) of data it allows, relationships between types:

  • unsorted: kind signatures have only one sort
  • many-sorted: allow different sorts, like Integer, Bool, List...
  • ordered-sorted: allow different sorts, e.g. Natural < Integer < Rational.

Maude

In Maude, functional modules are equation theories $(\Sigma, E)$, declared like:

$$ \text{fmod}(\Sigma, E)\text{endfm} $$

Such theories can be unsorted, many-sorted, or order-sorted, or even more general membership equational theories.

Many-Sorted Signature

A many-sorted signature is just a labeled multigraph $\Sigma = (S, F, G)$, whose nodes are called sorts $S$, whose labels are called function symbols $F$, and whose labeled multiedges $G$ are called the typings of the function symbols.

In the signature terminology, we call $f: s_1...s_n$ typing of $f$ with input sorts $s_1...s_n$ and result sort $s$. For $a: ε → s$, we called $a\in F$ a constant symbol of sort $s$.

many-sort multigraph example
many-sort multigraph example

When all operations are unary, signature are exactly labeled graph.

Problem in many sorted signature: Many-sorted signatures are still too restrictive. The problem is that some operations are partial, and there is no natural way of defining them in just a many-sorted framework.

e.g. we have a typing Op first: list -> Natural . We have then the problem to defining the value of first(nil), which is undefined.

Solution: A much better solution is to recognize that these functions are partial with the typing just given, but become total on appropriate subsorts NeList < List of nonempty lists.

$\star$ Subsorts also allow us to overload operator symbols.

Order-Sorted Signature

An order-sorted signature $\Sigma$ is a triple $\Sigma = ((S, \lt), F, G)$, where $\lt$ is a partial order relation on the set of sorts called subsort inclusion. That is, it is a binary relation on $S$ that is:

  • irreflective: $\neg(x\lt x)$
  • transitive: $x\lt y$ and $y \lt z$ imply $x\lt z$

$\star$ Unless specified otherwise, by a signature we will always mean an order-sorted signature.

Connect component of Poset:

  • if $s\leq s'$ or $s' \leq s$, then $s\equiv_{\leq} s'$
  • if $s\equiv_{\leq} s'$ and $s' \equiv_{\leq} s'' $, then $s\equiv_{\leq} s''$

We call the equivalence classes modulo $\equiv_{\leq}$ the connected components of the poset order $(S, \leq)$

equivalence classes example
equivalence classes example

subsort overloaded: In a sort module, there are multi functions with different input sort in the same equivalent class

ad-hoc overloaded: multi functions with different input sort but not in same equivalent class

We can illustrate the order-sorted signature in a similar way like that in many-sort signature, and draw sorts in the same poset as a set of node:

order-sorted graph example
order-sorted graph example

Algebras

An (unsorted, many-sorted, or order-sorted) signature $\Sigma$ is just syntax: provides the symbols for a language; Algebras describe what the language talking about and its semantics.

$\star$ Function symbols do not mean anything until they are interprets as actual functions.

Definition: The unsorted $\Sigma-\text{algebra}$ is a pair $\mathbb{A}=(A, \__\mathbb{A})$, where $A$ is interpretation of s, is a set specifying the data elements in the algebra $\mathbb{A}$. and $\__\mathbb{A}$ is a symbol interpretation function that maps:

  • each constant $a$ to $a_\mathbb{A}\in A$
  • each function symbol $f$ to a function $f_\mathbb{A}$

In Maude, the meaning (interpretation) of a Maude functional module is an algebra, denoted $\mathbb{C}_{\Sigma/E}$, and called its canonical term algebra.

$$ \mathbb{C}_{\Sigma/E} = (T_\Omega, \__{\mathbb{C}_{\Sigma/E}}) $$

where $T_\Omega$ is constructor terms.

To define $\__{\mathbb{C}_{\Sigma/E}}$, requires three properties:

  • Unique Termination: Given any-term $t$, the repeated application in any order of the above equations to $t$ as left-to-right simplification rules always terminates with a unique result.
  • Sufficient Completeness: The simplification of any-term $t$ always terminates in a constructor term.

These two properties can all be checked automatically via using Maude tools.

Many-Sorted Algebras

in many-sorted algebras, $\Sigma=\text{algebra}$ is a pair $\mathbb{A=(A, \__\mathbb{A})}$, where

  • $A$ is a sorted symbol interpretation function, choosing for each sort symbol $s\in S$ a corresponding data set $A_s$ interpreting that sort. If $S=\{s1,...,s_n\}$, then $A$ is a function:

    $$ A:\{s1,...,s_n\}→ A_{si} \in \{A_{s1},...A_{sn}\} $$

    We denote the sort interpretation function $A=\{A_s}_{s\in S\}$, called $A$ an S-indexed set.

  • $\__\mathbb{A}$ is the function symbol interpretation function

Order-Sorted Algebras

In the order-sorted signature, an obvious $\Sigma$ Algebra is the term algebra $\mathbb{T}_{\Sigma}$, where the family of data sets $T_\Sigma \{T_\Sigma, s\}_{s\in S}$

where ${\__\mathbb{T}}_{\Sigma}$ are defined in inductive way:

  • $a\in s → {a_\mathbb{T}}_{\Sigma} = a \in \mathbb{T}_{\Sigma, s}$
  • for each $f: w→ s$, ${f_\mathbb{T}}_{\Sigma}$ maps tuple $t1,...,tn$ to the expression (term) $f(t1,...,tn)\in \mathbb{T}_{\Sigma,s}$.
  • if $s \lt s'$, then $T_{\Sigma, s} \subseteq T_{\Sigma, s'}$

In order-sorted algebras, to define $\__{\mathbb{C}_{\Sigma/E}}$, besides the two properties above, we also need to guarantee:

Sort Preservation: for terms in sort $s$, $t!_E \in T_{\Omega, s}$. This property holds automatically in the unsorted and many-sorted cases, but may fail for order-sorted case.

Sensible Signature

A sensible signature prevents the ambiguity that arises when terms could have different meanings. In essence, it ensures that if two different interpretations of a term are possible (due to overloading), those interpretations must be related by the sort hierarchy, meaning one is more specific than the other.

if the argument sorts of two overloaded versions of $f$ are related by a subsort relation, then the result sorts $s$ and $s'$ must also be related by the same subsort relation.

Lemma: if $\Sigma$ is a sensible order-sorted signature, then for any term $t$ in the term algebra $T_{\Sigma}$, if $t$ belongs to both $T_{\Sigma, s}$, $T_{\Sigma, s'}$, it follows that the sorts $s$ and $s'$ are related by the subsort relation, i.e., $s \equiv_\leq s'$

Preregular: A sensible order-sorted signature is called preregular iff for each -term $t$ (possibly with variables $X$), the set of sorts

$$ Sorts(t)= \{s\in S| t \in T_{\Sigma(X), s\}} $$

includes a least element of such set in the poset, called the least sort of $t$ and denoted $ls(t)$. That is:

$$ ls(t) \in Sorts(t) ∧ \forall s' \in Sorts(t), \quad ls(t) \leq s' $$

Kind

We can extend our signature without losing or changing functionality of our original module, we called such extended signature $\Sigma^\square$, we can do it by

  1. adding a new sort [s], called a kind, to each connected component [s], with $\forall s' \in [s], [s] \gt s'$
  2. lifting each operator f : s1 ... sn -> s, n ≥ 1, to the kind level as: f : [s1] ... [sn] -> [s].

Variables

variables in Maude are in math sense, and we can also reduce the notion of terms with variables to that of terms without variables (ground terms) in an extended signature (add to $\Sigma$ each $x \in X_s$ as a constant $x: ε → s$ where $X=\{X_s\}_{s\in S}$ of sets of variables for different sort).

  • Note that if $\Sigma$ is a sensible signature, then it is trivial to check that $\Sigma(X)$ is also, by construction, a sensible signature.
  • One can likewise prove that if $\Sigma$ is preregular, then $\Sigma(X)$ is also preregular.

$\star$ variable in term $t$ denotes by: $var(t)$

Substitution

if $\__\theta = X → T_{\Sigma (Y)}$, where $X, Y$ are variables. Such substitution can be expressed recursively by:

  • $x\theta = \theta(x)$
  • $f(t1,...,tn)\theta = f(t_1\theta, ..., t_n\theta)$

Equations

An equational theory is then a pair $(Σ, E)$, with $Σ$ and order-sorted signature, and $E$ a set of $Σ$-equations.

Subterms

In a $\Sigma$-term $f(t1, . . . , tn)$, the $t1, . . . , tn$ are called its immediate subterms, denoted $t_i \lhd f(t1, ..., tn); 1 \leq t \leq n$ . Note that the inverse relation $\lhd^{-1} = \rhd$ is well-founded, i.e., terminating.

A term $u$ is called subterm of $t$ iff $t \rhd^* u$, and a proper subterm of $t$ iff $t \rhd^+ u$. Note that the relation $\rhd^+$ is also well-founded and a strict order.

To indicate where a subterm $u$ is located we can replace $u$ by a hole $[]$. For example, we can indicate the two places where $g(a)$ occurs in $f (b, h(g(a)), g(a))$ by $f (b, h([]), g(a))$ and $f (b, h(g(a)), [])$. A term with a single occurrence of a hole $[]$ is called a context. We write $C[]$ to denote the context. we can obtain a new term, denoted $C[u]$, by replacing the hole $[]$ by the term $u$.

Equational and Rewriting

Equational deduction is the systematic replacement of equals by equals using the given equations $E$.

Algebraic simplification produces a special type of equational proofs, called algebraic simplification proofs, where equations are always applied from left to right. This left to right process is called term rewriting, or term reduction.

$\star$ Given a sensible order-sorted signature $Σ = ((S,<), F, G)$, a $Σ$-rewrite rule is a sequent $t → t'$, and where we require that the rule $t → t'$is well-typed, in the sense that there are sorts $s, s' \in S$ such that $t\in T_{\Sigma, s}$, $t'\in T_{\Sigma, s'}$, and $[s] = [s']$.

A term rewriting system is then a pair $(Σ, R)$, with $Σ$ an order-sorted signature, and $R$ a set of Σ-rewrite rules.

E-equality relation: $↔_E^*$ or $=_E$. It is also called the relation of equality modulo $E$.

Problem: However, certain equations are intrinsically problematic for term rewriting, for example, $x+y=y+x$, as 1. we do not obtain a simpler term, but only a “mirror image” of the original term; 2. even worse, we can easily loop when applying this equation. The solution to this problem is to build in certain, commonly occurring equational axioms. Let Σ be a sensible order-sorted signature.

A rewrite theory is a triple $(Σ,B, R)$, where $B$ is a set of $Σ$-equations, and $R$ is a set of $Σ$-rewrite rules.

$$ u =_B u' → v' =_B v. $$

We call $→_{R/B}$ the one-step $R$-rewrite relation modulo $B$, and denote by $→ ^0_{R/B}$. The relation $=_B$, called the 0-step $R$-rewrite relation modulo $B$.

An $R$-rewrite proof modulo $B$ is either:

  • a 0-step R-rewrite modulo B of the form $u→ ^0_{R/B}v$, so that. $u=_B v$
  • a sequence of $R$-rewrite steps modulo $B$, in form $v0 →_{R/B} v1 →_{R/B} v2 . . . vn−1 →_{R/B} vn$

Sometimes, if we declare our equation not carefully, it may cause the module nonterminating, for example:

fmod NAT-SET-NONTERMINATING is protecting NAT .
    sort NatSet .
    subsort Nat < NatSet .
    op mt : -> NatSet [ctor] .
    op _,_ : NatSet NatSet -> NatSet [ctor assoc comm id: mt] .
    var X : NatSet .
    eq X,X = X .
endfm

In this case, $ACU$ and idempotency equation together will create a loop, and cause the rewriting never terminating. We can solve it by:

  1. restrict idempotency to singleton sets
var N : Nat .
eq N,N = N .
  1. use subsort overloading
sort NatSet NeNatSet .
subsort Nat < NeNatSet < NatSet .
op mt : -> NatSet [ctor] .
op _,_ : NatSet NatSet -> NatSet [ctor assoc comm id: mt] .
op _,_ : NeNatSet NeNatSet -> NeNatSet [ctor assoc comm id: mt] .
var X : NeNatSet .
eq X,X = X .

Executability Conditions

For every functional module, they should satisfied (1) Unique terminating, (2) Sufficient Completeness, (3) Sort Preservation, and for every constructor term has a canonical term algebra as its sematic.

  • Unique Termination will follow from $\vec E$ being:

    1. terminating modulo B, and
    2. confluent modulo B
  • Sufficiently Completeness will follow from $\vec E$ being so modulo B, and
  • Sort Preservation will follow from $\vec E$ being sort decreasing.

$\star$ For each $t→ t' \in \vec E$, any variable $x$ occurring in $t'$ must also occur in $t$, i.e., $vars(t') \in vars(t)$

Sort Decreasingness

$\star$ (Sort Decreasingness) For each $t→ t'\in \vec E, s \in S$, and each substitution $\theta$, we have $t\theta: s → t'\theta: s$

For example, for sorts Nat < Set, with $\cup$ (set union), the rule $x→ x \cup x$, with $x: Set$, is not sort-decreasing, since for the sort specialization $\{x: Set → x': Nat\}$ we have $ls(x')=Nat\lt Set = ls(x'\cup x')$

Theorem: $\Sigma$ is $B$-preregular iff the rewrite theory $(\Sigma, \overset {→}{B} \cup \overset {\leftarrow}{B})$ is sort decreasing.

$B$-preregular: (1) $\Sigma$ is preregular, and (2) $t=_B t'$ implies that $ls(t)=ls(t')$

$\star$ (Determinism/Confluence) if $t$ can be rewrite to both $u, v$. Then, $u =_B v$

confluence example
confluence example

Unique Terminating

in rewriting theory, rules $\vec E$ are called terminating modulo B iff $→_{\vec E /B}$ is well founded. $\vec E$ is called weakly terminating modulo B has a $\vec E / B$-normal form, i.e., $∃ v \in T, t →_{\vec E/B}v ∧ \not ∃ w \in T \quad s.t. v →_{\vec E/B} w$, denoted as $t →_{\vec E/B}^! v$

$\star$ if $\Sigma$ is B-preregular, and $\vec E$ is sort-decreasing, both Unique Termination and Sort Preservation hold.

Joinable: two term are joinable if they can be rewrite into the same term.

Church-Rosser property:

$$ t =_{E\cup B}t' \iff t ↓_{\vec E / b} t' $$

Local Confluence

if we have $t →_{R/B}u$ and $t→_{R/B}v$, then $u↓_{R/B}v$.

Lemma: a rewrite theory is confluence if it is locally confluent and terminating.

we can prove local confluence by proving that all critical pairs are joinable.

When we can apply multiple rules to one term, there are three possibilities:

  1. Nested Simplification with Overlap: nested simplification with overlap: there is a path $r$ such that $q =p.r$ (or $p = q.r$, but this case is symmetric) and $r$ is a nonvariable position in u; [show all critical pairs are joinable]
  2. Nested Simplification without Overlap: nested simplification with overlap: there is a path $r$ such that $q =p.r$ (or $p = q.r$, but this case is symmetric) and r is not a nonvariable position in $u$; [always joinable]
  3. Sideways Simplification: there isn’t such an $r$ at all. [always joinable]

When we deal with the Nested Simplification with Overlap case, can be reduce to finding, for all pairs of equations $u=v$ and $u'=v'$ in $E$ and all nonvariable positions $p$ in $u$. all solutions to unification problems.

Unification:

the unification problem consists in, given terms $t$ and $t'$ whose sorts are in the same connected component, finding a substitution $θ$ that makes them equal. finding $\theta$ to make $t\theta = t'\theta$

B-Unification: find $\theta$ to make $t\theta =_B t'\theta$

  • Delete: $\frac{\{E\}, t=t}{\{E\}}$
  • Decompose: $\frac{\{E, f(t_1,...,t_n)=f(t_1',...,t_n')\}}{\{E, t_1=t1',...,t_n=t_n'\}}$
  • Conflict: $\frac{\{E, f(t_1,...,t_n)=g(t_1',...,t_m')\}}{\text{failure}}$ if $f \neq g$
  • Coalesce: $\frac{\{E, x=y\}}{\{E\{x→ y\}, x=y\}}$
  • Check: $\frac{\{E, x=t\}}{\text{failure}}$ if $x\in vars(t), x \neq t$
  • Eliminate: $\frac{\{E, x=t\}}{\{E\{x→ t\}, x=t\}}$ if $x\not \in vars(t), t \not\in V, x \in vars(E)$

e.g.

{f(g(x, h(y)), z) = f(z, g(k(u), v))} −→ (Decompose)
{g(x, h(y)) = z, z = g(k(u), v)} −→ (Eliminate)
{g(x, h(y)) = g(k(u), v), z = g(k(u), v)} −→ (Decompose)
{x = k(u), v = h(y), z = g(k(u), v)} −→ (Eliminate)
{x = k(u), v = h(y), z = g(k(u), h(y))}

For B-unification, there can be several solutions to a unification problem. Also, we define failure as an identity element for $\_∧\_$

e.g.

{f(g(h(y), x), z) = f(z, g(k(u), v))} −→ (Decompose)
{g(h(y), x) = z, z = g(k(u), v)} −→ (Eliminate)
{g(h(y), x) = g(k(u), v), z = g(k(u), v)} −→ (Decompose-C)
{x = v, k(u) = h(y), z = g(k(u), v)} ∨ {x = k(u), v = h(y), z =
g(k(u), v)} −→ (Conflict ∨ Eliminate)
failure ∨ {x = k(u), v = h(y), z = g(k(u), h(y))} =
{x = k(u), v = h(y), z = g(k(u), h(y))}
applying the resulting unifier we obtain the identity,
f(g(h(y), k(u)), g(h(y), k(u))) =comm f(g(k(u), h(y)), g(k(u), h(y))).

Critical Pairs

e.g.

Notice that none of the equation in the module share the same variable, our rule set for this module will simply be convert all equation to be oriented:

$$ \vec{E} = \{x+0 → x, y+s(z) → s(y+z), (x1+y1)+z1→ x1+(y1+z1)\} $$

To compute the set of all critical pairs of this module, we need to find:

$$ CP(\Sigma, \vec{E}) = \left( \bigcup_{1 \leq i < j \leq n} CP(u_i \to v_i, u_j \to v_j) \cup CP(u_j \to v_j, u_i \to v_i) \right) \cup \bigcup_{1 \leq i \leq n} CP(u_i \to v_i, u'_i \to v'_i) $$

where $\vec{E} = \{ u_1 \to v_1, \dots, u_n \to v_n \}$. We can calculate all critical pairs $CP(u→ v, w → q)$:

Calculate: $CP(y+s(z) → s(y+z), x+0 → x)$

In this case, $p=\{ε, 2\}$, and when $p=ε$:

$$ \theta \in Unif_{\Sigma}(y+s(z) = x+0) $$

However, the unification failed since $s(z)$ cannot be unified to be a constant $0$.

when $p=2$:

$$ \theta \in Unif_{\Sigma}(s(z) = x+0)= \{s(z)→ x+0\} $$

it cannot be unified because $top(x+0)=+ \neq s = top(s(z))$ (Remark 1)

Termination

A well-founded ordering $>$ on $T_\Sigma$ is called a reduction ordering iff it satisfies the following two conditions:

  • strict $\Sigma$-monotonicity: for each $f \in \Sigma$, whenever $f(t_1, \dots, t_n)$, $f(t_1, \dots, t_{i-1}, t'_i, t_{i+1}, \dots, t_n) \in T_\Sigma(V)$ with
    $t_i > t'_i$, then $f(t_1, \dots, t_n) > f(t_1, \dots, t_{i-1}, t'_i, t_{i+1}, \dots, t_n)$.
  • closure under substitution: if $t > t'$, then, for any substitution $\theta : V \to T_\Sigma(V)$ we have $t\theta > t'\theta$.

Theorem: For any equational theory, $\vec E$ is terminating iff there exists a reduction order $\gt$ s.t. $\forall (u=v)\in E, u \gt v$

RPO:

The recursive path ordering (RPO) is based on the idea of giving an ordering on the function symbols in Σ, which is then extended to a reduction ordering on all terms.

Given a finite signature $\Sigma = ((S, <), F, G)$, plus an ordering $>$ and a status function $\tau$ on its symbols $F$, the recursive path ordering $>_{\text{rpo}}$ on $\bigcup_{s \in S} T_\Sigma(V)$ is defined recursively as follows. $u >_{\text{rpo}} t$ iff: $u = f(u_1, \dots, u_n)$, and either:

  1. $u_i \geq_{\text{rpo}} t$ for some $1 \leq i \leq n$, or
  2. $t = g(t_1, \dots, t_m)$, $u >_{\text{rpo}} t_j$ for all $1 \leq j \leq m$, and either:

    • $f > g$, or
    • $f = g$ and $\langle u_1, \dots, u_n \rangle >^{\tau(f)}_{rpo} \langle t_1, \dots, t_n \rangle$
  • f $f$ has $n$ arguments and $\tau(f) = \text{lex}(\pi)$ with $\pi$ a permutation on $n$ elements, then $\langle u_1, \dots, u_n \rangle >_{\tau(f)}^{\text{rpo}} \langle t_1, \dots, t_n \rangle$ iff there exists $i$, $1 \leq i \leq n$ such that for $j < i$, $u_{\pi(j)} = t_{\pi(j)}$, and $u_{\pi(i)} > t_{\pi(i)}$.
  • If $\tau(f) = \text{mult}$, then $\langle u_1, \dots, u_n \rangle >_{\tau(f)}^{\text{rpo}} \langle t_1, \dots, t_n \rangle$ iff we have $\{u_1, \dots, u_n\} >_{\text{mult}}^{\text{rpo}} \{t_1, \dots, t_n\}$.

$\star$ we need a reduction order that is compatible with the axioms $B$. That is, if $u > t, u =_B u_0$ and $t =_B t_0$ , then we must also have $u_0 > t_0$

Polynomial Orderings:

Another general method of defining suitable reduction orderings is based on polynomial orderings. In its simplest form we can just use polynomials on several variables whose coefficients are natural numbers

we assign each function symbol in $\Sigma$ a polynomial $pf$ involving exactly the variables $x_1, ..., x_n$. If $f$ is subsort overloading, we assign the same $pf$ to all such overloadings. Also, each constant symbol $b$ associate a positive number $p_b \in \mathbb{N}$. We assign variable to itself $p_x = x$

e.g.

f(g(x)) = g(f(x))

  • $p_f = x_1 ^3$
  • $p_g=2x_1$
  • $p_a = 3$

We can prove that $((2x)^3)_{\mathbb{N}\geq k} \gt (2(x^3))_{\mathbb{N}\geq k}$, hence terminating.

Sufficient Completeness

each $u\in T_\Sigma \quad \text{s.t.}\quad t→^!_{\vec E/B}u$, we have $u\in T_\Omega$.

$\star$ if the kind-completed signature $\Sigma^\square$ is kind-complete, then the requirement for sufficient completeness should apply only to terms in the original signature $\Sigma$, not in the kind-completed signature $\Sigma^\square$.

Free modulo: the constructors are free modulo $B$ if the following condition holds: For each term $u$ of sort $s$ (where $s$ is not a kind) built from constructors (terms in $T_\Omega$), the term $u$ is already in its normal form with respect to the equations in $B$ and the rewrite rules $→_E$.

Check Sufficient Completeness

For arbitrary equational theories sufficient completeness is in general undecidable. However, in practice, For equational theories satisfying the above requirements

(i) left-linear (i.e., there are no repeated variables in the lefthand side term of the equation);

(ii) sort-decreasing; and

(iii) terminating.

we can use decidability results from tree automata theory to cast the sufficient completeness problem into a tree automata problem and decide the problem that way.

we can use decidability results from tree automata theory to cast the sufficient completeness problem into a tree automata problem and decide the problem that way.

Automata: An ordinary finite-state automaton $A$ has a finite set $Q$ of states and accepts strings of inputs when they lead the automaton to a subset $Q_0 ⊆ Q$ of accepting states. The language $L(A)$ of the automaton is then the set of all strings accepted by $A$. Such languages are called regular languages and have nice decidablity results: they are closed under Boolean operations (we can construct automata for each such operation); and we can decide whether $L(A)$ is empty.

Tree automata: All this is generalized by finite-state tree automata, which accept terms in an unsorted term algebra $T_\Sigma$ instead of just strings. A tree automaton (TA) is a tuple $A = (\Sigma, Q, Q_0, R)$ with $Σ$ an unsorted signature, $Q$ a set of extra constants not in $Σ$ called states, $Q_0 ⊆Q$ a subset of accepting states, and $R$ a set of transition rules, which can be of two forms:

  • $f(q1, . . . , qn) → q$, with $q1, . . . , qn, q ∈ Q, f ∈ Σ$ and $f(q1, . . . , qn) ∈ T_{Σ(Q)}$ (for n = 0, $f$ will be a constant, and the rule becomes $f → q$)
  • $q → q ′$ , with $q, q′ ∈ Q$; these are called epsilon transitions and define a rule subset $Rϵ ⊆ R$.

We then define the language $L(A)$ as the subset $L(A) ⊆ T_Σ$ of those $t ∈ T_Σ$ such that there is a $q ∈ Q_0$ such that $ t →_R^* q$. A subset $L ⊆ T_Σ$ is called regular iff there is a finite-state tree automaton $A$ such that $L = L(A)$.

$\star$ Automata are labeled graphs. Tree automata are just labeled multigraphs, actually, order-sorted signatures

Under conditions (i)–(iii) $(Σ, E ∪ B)$ is sufficiently complete iff for each sort $s$ we have $D_s \ (Red ∪ C_s ) = ∅$, which can be decided by deciding emptiness of the corresponding tree automaton.

  • $D_s$: terms of sort $s$ having a defined symbol on top and constructor terms as arguments
  • $C_s$: constructor terms of sort $s$
  • $Red$: terms reducible by the oriented equations $\vec E$ modulo $B$

e.g.

To see how the desired tree automata needed to decide sufficient completeness can be built, we can use a simple example, our usual unsorted specification for addition for the Peano natural numbers with a single sort N at, with 0 and s as constructors, and with equations x + 0 = x and x + s(y) = s(x + y). This specification satisfies conditions (i)–(iii), since it is left-linear, confluent, sort-decreasing, and terminating. To recognize each of the regular sets Red, DNat, and CNat we need three tree automata ARed, ADNat , and ACNat .

The tree automata ARed, ADNat and ACtor have the same signature Σ (that of the natural numbers), set of states Q = {N at, Red, DNat, Ctor , Zero, NzCtor}, and transitions R:

  • (ϵ-transitions): Ctor → N at, Red → DNat, DNat → N at, Zero → Ctor , NzCtor → Ctor
  • (Nat transitions) s(N at) → N at, N at + N at → N at.
  • (constructor transitions): 0 → Zero, s(Ctor ) → NzCtor .
  • (defined function transition): Ctor + Ctor → DNat.
  • (reducibility transitions): Ctor + Zero → Red, Ctor + NzCtor → Red

ARed, ADNat and ACtor only differ in their respective accepting state: Red, DNat, and Ctor . Since their: (i) signature, (ii) set of states and (iii) transitions are the same, the shared part (i)–(iii) is exactly the same thing as the order sorted signature:

sorts Nat Red, D-Nat Ctor Zero NzCtor .
subsorts Zero NzCtor < Ctor < Nat .
subsorts Red < D-Nat < Nat .
op s : Nat -> Nat .
op s : Ctor -> NzCtor .
op 0 : -> Zero .
op _+_ : Nat Nat -> Nat .
op _+_ : Ctor Ctor -> D-Nat .
op _+_ : Ctor Zero -> Red .
op _+_ : Ctor NzCtor -> Red .

The point now is that each Boolean operation on regular tree languages has a corresponding operation on their associated tree automata. Therefore, out of the automata ARed, ADNat , and ACtor we can construct an automaton that recognizes the language DNat \ (Red ∪ Ctor ). Let us call this automaton ADNat(Red∪Ctor) . We know that under conditions (i)–(iii) our specification is sufficiently complete iff DNat \ (Red ∪ Ctor ) = ∅. Therefore, we can decide this property by testing ADNat(Red∪Ctor) for emptiness. If the test (as for this example) succeeds, we are done. If it doesn’t, we get a very useful counterexample term, showing us where sufficient completeness fails.