Tempus Fugit: How To Plug It Alessandra Di Pierro Chris Hankin Igor Siveroni

Tempus Fugit: How To Plug It
Alessandra Di Pierro 1
Dipartimento di Informatica, Universit´
a di Pisa, Italy
Chris Hankin 1 Igor Siveroni 1 Herbert Wiklicky 1
Department of Computing, Imperial College London, United Kingdom
Abstract
Secret or private information may be leaked to an external attacker through the
timing behaviour of the system running the untrusted code. After introducing a
formalisation of this situation in terms of a confinement property, we present an
algorithm which is able to transform the system into one that is computationally
equivalent to the given system but free of timing leaks.
Key words: Security Analysis, Timing Leaks, Program Transformation,
Probabilistic Processes
1
Introduction
This paper concerns the transformation of programs to remove covert channels.
Our focus is on timing channels, which are able to signal information through
the time at which an action occurs (a command is executed).
A system is vulnerable to a timing attack if there are alternative control flow
paths of observably different duration; a common refinement of this definition would be just to focus on those alternative paths where the selection
is guarded by confidential information. We will show that the comparison of
different paths is essentially a confinement problem. We have studied confinement in [1] using probabilistic bisimulation. Specifically, we showed that two
processes are probabilistically confined if and only if they share a common
(probabilistic) abstract interpretation (Proposition 39 of [1]). Our approach
to probabilistic abstract interpretation is based on Operator Algebra and thus
1
All authors are partly funded by the EPSRC project S77066A.
Preprint submitted to Elsevier Science
3 May 2006
abstractions are measurable by a norm; in particular, in op. cit., we showed
how to measure the difference between two processes and thus introduced the
notion of approximate confinement. In op. cit. this measure, , was given a
statistical interpretation which indicated how much work an attacker would
have to perform in order to differentiate the processes.
In this paper we use our previous work to identify the need for transformation. We identify the alternative paths which fail to be bisimilar. These are
then candidates for transformation. We present a transformation which establishes probabilistic bisimilarity whilst retaining input/output equivalence
with the original system. This transformation involves the introduction of new
“padded” control paths. This is similar to the approach of Agat [2]. However,
in contrast to Agat, our work is not programming language specific; instead
it is based on probabilistic transition systems which provides a very general
setting. The transformed system produced by our approach is more efficient
than that which would be produced by Agat’s algorithm. This is because our
algorithm introduces less padding steps.
In the next section, we review some background material on probabilistic transition systems and their representation as linear operators. Section 3 reviews
the pertinent results from [1] and shows how to use these results to identify potential timing leaks. Section 4 and Section 5 present our algorithm for padding
and its correctness. The remaining sections present a series of examples and
comparisons to related work.
2
Probabilistic Processes
Probabilistic process algebras extend the well-established classical techniques
for modelling and reasoning about concurrent processes with the ability of
dealing with non-functional aspects of process behaviour such as performance
and reliability, or in general the quantitative counterpart of classical properties
representing the functional behaviour of a system.
Several models for probabilistic processes have been proposed in the literature in recent years, which differ from each other essentially in the ”amount”
of nondeterminism that is allowed in the model. Replacing nondeterministic
branching by a probabilistic one leads to either the reactive or the generative
model introduced in [3], where the probability distribution which guides the
choice may depend (reactive) or not (generative) on the action labelling the
branches (or transitions). Other models allow for both nondeterministic and
probabilistic branching and lead to a myriad of variants depending on the
interplay between the two kinds of branching (cf. [4]).
2
In order to describe the operational behaviour of probabilistic processes we
consider probabilistic transition systems (PTS’s), that is probabilistic extensions of the notion of labelled transition systems for classical process algebras.
Probabilistic transition systems essentially consist of a set of states and a set of
labels, and their behaviour can be informally described as follows: To a given
action of the environment the system responds by either refusing it or making
a transition to a new state according to a probability distribution. In this paper we consider only PTS’s with a finite number of states and actions, which
reflect the generative model of probability; they effectively correspond to a
particular kind of probabilistic processes, namely finite Markov chains [5].
2.1
Probabilistic Transition Systems
Formally, we can define a probabilistic transition system in all generality as in
[6]. We denote by Dist(X) the set of all probability distributions on the set
P
X, that is of all functions π : X → [0, 1], such that x∈X π(x) = 1.
Definition 1 A probabilistic transition system is a tuple (S, A, −→, π0 ), where:
•
•
•
•
S is a non-empty, finite set of states,
A is a non-empty, finite set of actions,
−→ ⊆ S × A × Dist(S) is a transition relation, and
π0 ∈ Dist(S) is an initial distribution on S.
α
For s ∈ S, α ∈ A and π ∈ Dist(S) we write s −→ π for (s, α, π) ∈ −→.
p:α
By s −→ t we denote the transition to individual states t with probability
p = π(t). We denote the transition probability from a state s1 to a state s2
with label α by p(s1 , α, s2 ). For a set of states C we write the accumulated
P
transition probability from s1 to C as: p(s1 , α, C) = s∈C p(s1 , α, s).
In the generative model the transition relation of a PTS (S, A, −→, π0 ) is a
partial function −→: S ,→ Dist(S × A); this means that the same probability
distribution is used to govern both the choice of the action and the (internal)
state transition. Looking at a PTS as a Markov chain, we can identify it with
the tree of the outcomes of the associated stochastic process; this can be seen
as an experiment which takes place in stages. At each stage the probability
for each possible outcome is known when the outcomes of all previous stages
are known, and the number of the possible outcomes at each stage is finite.
We can define the tree associated to a PTS inductively as follows.
Definition 2 Let S is be a non-empty, finite set of states, and let A be a
non-empty finite set of actions. Then
• ({s}, A, ∅, {hs, 1i}) is a tree-PTS with root s ∈ S;
3
• if Ti = (Si , A, −→i , {hsi , 1i}), i = 1, . . . , m, m < ∞ are tree-PTS’s with
roots si ∈ Si , Si ⊆ S for all i = 1, . . . , m, Si ∩ Sj = ∅ for i 6= j, s ∈
S
S \ i Si , and {h(si , αi ), pi i | si ∈ Si , αi ∈ A, i = 1, . . . , m} is a probability
S
S
pi :αi
distribution, then the PTS constructed as ({s}∪ i Si , A, {s −→
si }∪ −→i
, {hs, 1i}) is a tree-PTS with root s. We call Ti the sub-trees of T .
When the transition probabilities form a sub-probability distribution, that is
P
a function π : S × A → [0, 1] with s∈S π(x) ≤ 1, then we call the PTS a
subPTS (tree-subPTS).
2.2
Linear Representation of Probabilistic Transition Systems
As in previous work – e.g. [7,1] – we recast the common relational presentation of PTS’s in terms of linear maps and operators in order to provide an
appropriate mathematical framework for the quantitative study of probabilistic processes.
Definition 3 Given a (sub)PTS T = (S, A, −→, π0 ) and a fixed α ∈ A then
α
the matrix representing the probabilistic transition relation −→ ⊆ S ×[0, 1]×S
corresponding to α is given by

p:α

 p iff s −→
t
α )st =
(M−→


0 otherwise
where s, t ∈ S, and (M)st denotes the entry in column s and row t in the
matrix M. We define the matrix representation M(T ) of the (sub)PTS P as
α
the direct sum of the matrix representations of the transition relations −→ for
each α ∈ A:
M(T ) =
M
α .
M−→
α∈A
We recall that for a set {Mi }ki=1 of ni × mi matrices, the direct sum of these
P
P
matrices is defined by the ( ki=1 ni ) × ( ki=1 mi ) “diagonal” matrix:
M=
M
Mi = diag(M1 , M2 , . . . , Mk )
i
Given k square, n × n matrices Mi and an m × n matrix K then we use the
L
L
L
short hand notation K( ki=1 Mi ) = ( ki=1 K)( ki=1 Mi ), i.e. we apply K to
each of the factors Mi .
4
For probabilistic (sub-probabilistic) relations we obtain a so-called stochastic
(sub-stochastic) matrix, i.e. a positive matrix where the entries in each row
sum up to one (are less than or equal to one).
Matrices are not just schemes for writing down probabilities but also a way to
specify linear maps or operators between/on vector spaces. In our case, PTS’s
are represented as operators on the vector space V(S) = { (vs )s∈S | vx ∈ R},
i.e. the set of formal linear combinations of elements in S with real coefficients.
This space contains all distributions of the state space Dist(S) ⊂ V(S).
3
Process Equivalences and Confinement
The problem of preventing a system from leaking private information to unauthorised users is essentially the problem of guaranteeing the confinement [8] of
the system against some specific attacks. Typically, after the introduction of
the notion of noninterference [9,10], most of the work on confinement has been
based on models which ultimately depend on some notion of process equivalence by identifying the absence of information flow between two processes
via the indistinguishability of their behaviours [11].
In [1] it is shown how various process equivalences can be approximated, and
how probabilities can be used as numerical information for quantifying such
an approximation. This provides us with a quantitative measure of the indistinguishability of the process behaviour (according to the given semantics),
that is in a security setting a measure of their propensity to leak information. In particular, a notion of confinement can be obtained by stating that
two processes p and q are probabilistically confined iff they are probabilistic
bisimilar.
The central aim of this paper is to present a transformation algorithm for
PTS’s which constructs bisimilar PTS’s while preserving the computational
effects of the original systems. This section is devoted to a formal definition of
the two notions of process equivalence for PTS’s at the base of our treatment,
namely probabilistic bisimulation and probabilistic observables.
3.1
Probabilistic Bisimulation
Probabilistic bisimilarity is the standard generalisation of process bisimilarity
to probabilistic transition systems and is due to [12, Def 4].
Definition 4 A probabilistic bisimulation is an equivalence relation ∼b on
5
the states of a probabilistic transition system satisfying for all α ∈ A:
α
α
p ∼b q and p −→ π ⇒ q −→ % and π ∼b %.
Note that in the case of generative systems considered in this paper, π and %
are sub-probability distributions.
The notion of probabilistic bisimulation on PTS corresponds to the notion of
lumped process for finite Markov chains [5]. This can be seen by noting that a
probabilistic bisimulation equivalence ∼ on a PTS T = (S, A, →, π0 ) defines
a probabilistic abstract interpretation of T [1]. In fact, we have shown that
it is always possible to define a linear operator K from the space V(S) to
the space V(S/∼ ) which represents ∼. This so-called classification operator is
represented by the n × m-matrix:


 1 if si ∈ Cj
(K)ij = 

0 otherwise
with S/∼ = {C1 , C2 , . . . Cm } and n the cardinality of S. If M(T ) is the operator representation of T then K† M(T )K is the abstract operator induced by
K where K† is the so-called Moore-Penrose pseudo-inverse of K. For classification operators K we can construct K† as the row-normalised transpose of
K. Intuitively, K† M(T )K is an operator which abstracts the original system
T by encoding only the transitions between equivalence classes instead of the
ones between single states. K determines a partition on the state space of T
which is exactly the lumping of the states of T . This can be formally deduced
from the following theorem in [5].
Theorem 5 A necessary and sufficient condition for a Markov chain to be
lumpable with respect to a partition {C1 , C2 , . . . Cm } is that for every pair of
P
sets Ci and Cj , the probability pkCj = t∈Cj pit of moving in one step from
state sk into set Cj have the same value for every sk ∈ Ci . These common
values {˜
pij } form the transition matrix for the lumped chain.
We can re-formulate the probabilistic bisimilarity of two processes A and B
in terms of linear operators as follows, cf [1]:
Proposition 6 Given the operator representation M(A) and M(B) of two
probabilistic processes A and B, then A and B are probabilistic bisimilar, that
is A ∼b B, iff there exist classification matrices KA and KB such that
K†A M(A)KA = K†B M(B)KB .
6
Probabilistic bisimilarity of A and B can be expressed in terms of Markov
chains lumpability as follows:
Proposition 7 Given the operator representation M(A) and M(B) of two
probabilistic processes A and B, then A and B are probabilistic bisimilar iff
there exists a lumping K of the process M(A) ⊕ M(B) of the form


 KA 

K=
KB
for some classification operators KA and KB for A and B such that KA and
KB are fully column-ranked.
For tree-PTS the above result essentially means that an equivalence relation on
the union of the states of two processes A and B is a probabilistic bisimulation
for A and B iff their roots belong to the same equivalence class.
Probabilistic bisimulation is the finest of all process equivalences as it identifies processes whose step-wise behaviour is exactly the same. From an external
viewpoint other behaviours can be observed in order to distinguish two processes. In particular, a potential attacker may be able to obtain experimentally
two kinds of information about a process by observing what final outcomes the
process produces, and by measuring how long the process is running. Both the
input/output behaviour and the timing behaviour of a process correspond to
some suitable abstractions of the concrete behaviour of the process in question.
In this paper we will concentrate on these observables and show how we can
transform a process (PTS) in a way that its input/output behaviour is preserved, while its timing behaviour changes so as to become indistinguishable
from another given process.
3.2
Probabilistic Observables
In our PTS model, any execution of a process leads to a single trace, i.e. a
single execution path. Each time there is a probabilistic choice, one of the
possible branches is chosen with some probability. The combined effect of all
the choices made defines the probability of a particular path as the product of
probabilities along this computational path. For finite processes we only have
to consider finitely many paths of finite length and thus only need a basic
probability model. This is based on a finite set of events Ω, i.e. all possible
traces, such that all sets of traces have a well-defined probability attached, i.e.
are measurable, and the probability function P : P(Ω) → [0, 1] can be defined
P
via a distribution, i.e. a function π : Ω → [0, 1] such that P(X) = x∈X π(x)
7
for all X ⊆ Ω. This avoids the introduction of more complicated measuretheoretic structures on the set of possible paths.
The notion of observables we will consider is based on a trace semantics for
probabilistic processes whose intuition and formal definition are given below.
3.2.1
Trace Semantics
Formally, the trace semantics of a process T is defined via the notion of probabilistic result of a computational path.
p :α
p :α
pn :αn
1 1
2 2
Definition 8 For a computational path π = s0 −→
s1 −→
. . . −→ sn , we
define its probabilistic result R(π) as the pair hα1 α2 . . . αn , prob(π)i, where
Q
prob(π) = i pi . The trace associated to π is t(π) = α1 α2 . . . αn .
Since different computational paths may have the same associated trace (cf.
Example 9), we need to sum up the probabilities associated to each such
computational path in order to appropriately define a probabilistic result.
Formally, this can be done by defining for a process T and trace t the set
[π]T,t = {π | π a computational path for T and t(π) = t},
which collects all the computational paths with trace t.DThen we define for each
E
P
trace t the computational result R([π]T,t ) as the pair t, π∈[π]T,t prob(π) .
We can now define the trace semantics of T as the set
S(T ) = {R([π]T,t ) | t = t(π) and π is a computational path for T }.
Example 9 Consider the following simple processes with action A = {a, b, e}:
•
T1
1:a
•
| @@@ 1 :b
||
@@2
|
@@
||
}|
|
•
•
1:b
•
•
1
:a
2
• @@ 1
|
@@2 :e
||
@@
||
|
@
|} |
•
1
:b
2
•
1:e
•
•>
~~ >>> 21 :a
~
>>
~
>>
~~
~ ~
T4
1:a
•
1:b
•
T3
1
:a
2
1
:b
2
•
•>
~~ >>> 21 :a
~
>>
~
>>
~~
~
~
T2
•
1:b
•
1:e
•
•
1:b
•
1:b
•
It is easy to see that the possible traces for the first two processes are given by
the sequence ab while for process T3 we have the traces abe and aeb and for
process T4 we have aeb and ab.
We can thus describe the trace semantics of these four processes by:
8
S(T1 ) = {hab, 1i}
S(T2 ) = {hab, 1i}
S(T3 ) = {habe, 0.5i , haeb, 0.5i}
S(T4 ) = {haeb, 0.5i , hab, 0.5i}
3.2.2
I/O Observables
The notion of input/output observables, which we will refer to as I/O observables from now on, correspond to the most abstract observation of a process
behaviour as it just considers the final result of an execution. In our setting,
these final results are identified with probability distributions over the set of all
possible abstract traces. Abstract traces are defined in terms of a function [[.]]
which allows us to avoid specifying a concrete semantics for the labels, thus
keeping our approach as general as possible. In fact, in a PTS we can think
of labels as some kind of “instructions” which are preformed by the process,
e.g. a = “x := x-1”, b = “x := 2*x”, etc. The final result of an execution is
thus the accumulated effect of executing these instructions. Depending on the
specific application, the function [[.]] will be defined so as to abstract from the
appropriate computational details of the concrete traces.
For our purposes, we can leave the concrete nature of [[.]] unspecified and
only assume that there is a neutral instruction e, i.e. a label which represents
something like a skip statement, such that for any trace t1 t2 . . . tn we have:
[[t1 t2 . . . tn ]] ≡ [[t1 t2 . . . ti eti+1 tn ]]
i.e. putting e anywhere in the execution (including at the end or the beginning)
does not change the computational result. We will identify traces if the only
difference between them are some occurrences of this neutral label e; this
introduces an equivalence relation ≡ on the set of all traces.
Clearly, different concrete traces might be identified by [[.]]; for example, if a =
“x := x+1” and b = “x := x-1” one would usually have [[ab]] = [[e]] = [[ba]].
Definition 10 Given a tree-PTS T , we define its I/O observables as
O(T ) = {R([π]T,[[t]] ) | t = t(π) and π is a computational path for T }.
Proposition 11 For any tree-PTS T , its I/O observables O(T ) define a probability distribution on the set of states S.
9
PROOF. By induction on the tree structure of T .
• If T = ({s}, A, ∅, {hs, 1i}) then O(T ) is obviously a probability distribution
on S.
S
S
pi :αi
• If T = ({s} ∪ i Si , A, {s −→
si } ∪ −→i , {hs, 1i}), then by the inductive
hypothesis we have that for all i = 1, . . . , m that O(Ti ) is a probability
P
distribution on Si . Therefore, O(T ) = m
i=1 pi · O(Ti ) is a probability distriS
bution on i Si . 2
Based on this notion of I/O observables we can now define when two processes
are I/O equivalent.
Definition 12 Two PTS’s A and B are I/O equivalent, denoted by A ∼io B,
if they define the same probability distribution on the set of the equivalence
classes of abstract traces, i.e. iff
O(A) = O(B).
Example 13 If we consider again the four processes from Example 9, we get
the following I/O observables:
O(T1 ) = {h[[ab]], 1i}
O(T2 ) = {h[[ab]], 1i}
O(T3 ) = {h[[abe]], 0.5i , h[[aeb]], 0.5i} = {h[[ab]], 1i}
O(T4 ) = {h[[aeb]], 0.5i , h[[ab]], 0.5i} = {h[[ab]], 1i}
i.e. all four processes are I/O equivalent.
3.3
Timing Behaviour
An external observer may have the ability to measure the time a process needs
to perform a given task. In order to do so she just needs to observe “execution
times”, while the concrete nature of the performed instruction is irrelevant for
her purposes. In our model, the timing behaviour of a process can therefore
be defined by abstracting from labels (instructions) and assuming that there
is a function |.| from labels to real numbers which measures the time it takes
to execute the instruction represented by the label.
One can think of various choices for the timing function. For example, one can
assume (as in [13]) that there is only one special label t with |t| = 1 which
indicates time progress while all other labels take no execution time; in this
10
case the time flow is measured by the number of these special labels occurring
in a trace. Alternatively, one can assign a different duration to each label.
In our model we assume that all labels need a strictly positive time to be
executed, i.e. |α| > 0 for all labels, and that the execution time of all labels is
the same. In particular, while in general one could have various neutral labels
e with different executions time, e.g. |e0 | = 0, |e1√
| = 1, . . . , we will assume
that there is only one neutral element, denoted by , which consumes exactly
one time unit.
Formally we define the time semantics of our processes in terms of the notion
of time bisimulation, that is √
a bisimulation relation on PTS’s where transitions
are labelled only by action .
Definition 14 Given a PTS T = (S, A, −→, π0 ), we define its√depleted or
ticked version Tˆ, as the PTS (S 0 , A0 , −→0 , π00 ) with S 0 = S, A0 = { }, π00 = π0 ,
and −→0 defined by
(s,
√
, π) ∈−→0 iff (s, α, π) ∈−→ for some α ∈ A.
Time bisimilarity is defined as a probabilistic bisimilarity (cf. Definition 4) for
depleted versions of PTS.
Definition 15 Given two PTS’s A and B, we say that A and B are time
bisimilar, denoted by A ∼tb B, iff there is a probabilistic bisimulation on the
states of the ticked versions of A and B.
3.4
Timing Leaks
Our model for timing leaks essentially follows the noninterference based approach to confinement recalled at the beginning of this section. In particular,
we define timing attacks in terms of the ability of distinguishing the timing
behaviour of two processes expressed via the notion of time bisimilarity.
In [1] we have used the same idea to model probabilistic covert channels. The
notion of process equivalence used in that paper is an approximate version of
probabilistic bisimilarity as the purpose there is to provide an an estimate of
the confinement of the system, rather than just checking whether it is confined
or not. We aim to extending this current work toward a similar direction
(cf. Section 8). To this purpose, the linear representation of PTS, i.e. their
correspondence with processes which are Markov chains, is essential; in fact,
as shown in [1], it provides the necessary quantitative framework for defining
numerical estimates.
11
Definition 16 We say that two processes T1 and T2 are confined against timing attacks or time confined iff their associated PTS are time bisimilar.
We can re-formulate this definition of timing leaks in terms of lumping as
done in Proposition 6 for probabilistic bisimilarity. This will be the base for
the definition of the transformation technique we will present in Section 4.
L
Definition 17 Given the operator representation M(T ) = α∈A Mα (T ) of a
probabilistic process T then the linear representation of its ticked version is
given by:
c )=
M(T
X
Mα (T )
α∈A
Definition 18 We say that A and B are time bisimilar, A ∼tb B, iff there
exist classification matrices KA and KB , such that
† c
c
K†A M(A)K
A = KB M(B)KB .
In the following we present a technique for closing possible covert timing channels by transforming the system into an equivalent one which is free from
timing leaks. The transformation algorithm we present is applicable to finite
tree-PTS’s. It uses ideas from the optimal lumping algorithm in [14] and the
padding techniques from [2].
Intuitively, given a system formed by processes A and B, the algorithm is
based on the construction of a lumping for the Markov chain formed by the
direct sum of the two Markov chains representing the ticked versions of A
and B. By Proposition 7, the resulting lumped process will be a probabilistic
bisimulation for A and B iff the root nodes of A and B belong to the same
class (or they are essentially the same state in the lumped process). In this case
no transformation is needed as the two processes are confined against timing
attacks; otherwise the algorithm produces two new processes A0 and B 0 which
exhibit the same I/O behaviour as A and B but are also time confined. The
following diagram shows the effect of the transformation.
6∼tb
A
∼io
B
∼io
A0
∼tb
12
B0
4
Transforming Probabilistic Transition Systems
Checking whether two systems are bisimilar or not, aka the bisimulation problem in the literature, is important not only in concurrency theory but also in
many other fields such as verification and model checking, where bisimulation
is used to minimise the states space, but also in computer security where many
non-interference properties can be specified in terms of bisimulation equivalence [15,1]. Various algorithms have been proposed which solve this problem
in linear time, the most important one being the algorithm proposed by Paige
and Tarjan in [16]. This algorithm has been adapted in [17] for probabilistic
processes; in particular, the Derisavi et al. algorithm constructs the optimal
lumping quotient of a finite Markov chain in linear time, and can then be used
to check the probabilistic bisimilarity of two PTS’s.
In this paper we take a transformational approach: we not only check whether
two probabilistic processes are bisimilar or not, but in the case where the
processes are not bisimilar then we transform them into two bisimilar ones.
The transformation we propose is inspired by the Derisavi et al. algorithm
but does not aim to construct an “optimal” partition in the sense of [16].
It rather aims to add missing states and adjust the transition probabilities
when it is needed to make two given processes bisimilar. It also preserves the
probabilistic observable of the original processes.
The algorithm we present consists in essentially two intertwined procedures:
one constructs a lumping partition for the PTS union of two tree-PTS’s; the
other “fixes” the partition constructed by the lumping procedure if some properties are violated which guarantee the partition to be a probabilistic bisimulation for the given processes. Fixing corresponds to adding some missing states
(padding) and adjusting the transition probabilities in order to achieve bisimilarity. Moreover, the transformation is performed in a way that the original
I/O semantics (i.e. probabilistic observables) of the processes is preserved. The
overall procedure is repeated until a partition is constructed which effectively
makes the two processes probabilistically bisimilar.
Before embarking on a more detailed description of our algorithm, we introduce some necessary notions and definitions. In the rest of this paper we focus
on finite tree-structured PTS.
For a PTS (S, A, −→, π0 ) we will consider partitions P = {B1 , . . . , Bn } of S;
we will refer to the Bi ’s as the blocks of the partition P . Given a state s ∈ S
and a partition P we define the cumulative probability of s wrt B ∈ P as
P
q
p(s, B) = {q | s −→ t with t ∈ B}. We further define the backward image
for a block B and a probability q as preq,−→ (B) = {s | ∃B 0 ⊆ B : p(s, B 0 ) = q};
this corresponds to the set of all states from which B (i.e a state in B) can be
13
1: procedure Lumping(T )
2:
Assume: T is a tree-PTS
3:
P ← {S}
4:
while ¬Stable(P, T ) do
5:
choose B ∈ P
6:
P ← Splitting(B, P )
7:
end while
8: end procedure
Algorithm 1. Lumping Algorithm
reached with probability q. We will often omit the index −→ when it is clear
from the context.
For a tree-PTS T we will refer to Height(T ) as the maximal length of a path
in T from the root to a terminal state. The n layer cut-off of a tree-PTS T
is defined inductively as (i) the set of terminal nodes for n = 0, and (ii) for
n = i + 1 the set of nodes with only direct links (paths of length one) to nodes
in the i layer cut-off. The procedure CutOff(T, n) returns the n layer cut-off
of T which we also denote by T |n . The complement of the n layer cut-off is
sometimes called the n layer top and the top nodes in the n layer cut-off form
the n layer.
Auxiliary Procedures. An important feature of the transformation of
PTS’s we present in the following is the creation of new states. These will
be copies of existing states or of a distinguished additional “dummy state”
d. We will denote copies of a given state s by s0 , s00 , s000 , . . .. For copies of the
dummy state we usually will omit the primes. For a tree-PTS T = (S, A, −→
, {hr, 1i}) we define its copy as the tree-PTS T 0 = (S 0 , A, −→0 , {hr0 , 1i}) with
p:α 0
p:α
S 0 = {s0 | s ∈ S} and s0 −→ t0 iff s −→ t.
In the main algorithm we assume a routine CopyTree(T,s) which returns a
copy of the sub-tree of a tree-PTS T which is rooted in state s (where s is
a state in T ). We will also use a routine DepleteTree(T, s) which takes a
tree-PTS T and returns Tˇ, i.e. a copy of T where all states are√replaced by
copies of the dummy state d and all transitions are labelled by .
Another auxiliary procedure is Linking(T1 , s, α, p, T2 ) whose effect is to introduce a (copy) of a tree-PTS T1 into an existing tree-PTS T2 at state s; the
linking transition will be labelled by α and the associated transition probability will be p. A related procedure called Joining(T1 , s, p, T2 ) introduces a
(copy) of a tree-PTS T1 into an existing tree-PTS T2 at state s by identifying
the root of T1 with s and by multiplying initial transitions with probability p.
14
1: procedure MaxLumping(T1 , T2 )
2:
Assume: T1 = (S1 , A, −→, {hr1 , 1i}) is a tree structured PTS
3:
Assume: T2 = (S2 , A, −→, {hr2 , 1i}) is a tree structured PTS
4:
n←0
. Base Case
5:
P ← {S1 ∪ S2 }
. Partition
6:
S ← {S1 ∪ S2 }
. Splitters
7:
while n ≤ Height(T1 ⊕ T2 ) do
8:
T T ← CutOff(T1 ⊕ T2 , n)
. n layer cut off
9:
P ← {B ∩ T T | B ∈ P }
. Partition below current layer
10:
S ← P ∪ {(S1 ∪ S2 ) \ T T }
. Splitters (below and top)
11:
while S 6= ∅ do
12:
choose B ∈ S, S ← S \ B
. Choose a splitter
13:
P ← Splitting(B, P )
. Split (current layer)
14:
C ← {B ∈ P | B ∩ S1 = ∅ ∨ B ∩ S2 = ∅} . Critical class(es)
15:
if C 6= ∅ then
16:
return C
. Found critical class(es)
17:
end if
18:
end while
19:
n←n+1
. Go to next level
20:
end while
21: end procedure
Algorithm 2. Maximal Lumping Algorithm
Lumping. As already mentioned, our algorithm is inspired by the PaigeTarjan algorithm for constructing bisimulation equivalences [16], i.e. stable
partitions, for a given transition system. The same ideas can be used for
p
probabilistic transition relations −→; in this case we say that a partition
p
P = {B1 , . . . , Bn } of a set of states is stable with respect to −→ iff for all
blocks Bi and Bj in P and probability q we have that preq (Bi ) ∩ Bj = ∅ or
Bj ⊆ preq (Bi ).
The construction of the lumping partition is done via the refinement of an
initial partition which identifies all states in one class and proceeds as in
the algorithm Lumping (cf Algorithm 1). Starting with the initial partition
P = {S} the algorithm incrementally refines it by a splitting procedure. For a
block Bi the splitting procedure Splitting(Bi , P ) refines the current partition
P by replacing each block Bj ∈ P with prep (Bi ) ∩ Bj 6= ∅ for some probability
p by the new blocks prep (Bi ) ∩ Bj and Bj \ prep (Bi ). The block Bi is called a
splitter. We will also assume the definition of a procedure Stable(P, T ) which
returns true iff partition P is stable for PTS T .
In order to construct a ticked probabilistic bisimulation ∼tb for two treePTS’s T1 and T2 we apply a slightly modified lumping procedure (cf Algorithm 2) to the ticked version Tˆ of the union PTS T = T1 ⊕ T2 . The procedure
MaxLumping(T1 , T2 ) takes two tree-PTS’s T1 = (S1 , A, −→, {hr1 , 1i}) and
15
1: procedure Padding(T1 , T2 )
2:
C ←MaxLumping(T1 , T2 )
. identify critical classes
3:
repeat
4:
if C = {Ci }i with
√ Ci ∩ S1 = ∅, ∀Ci then
5:
D ← ({d}, { }, ∅, {hd,
√1i})
6:
T1 ← Linking(D, d, , 1, T1 )
7:
else if C = {Ci√
}i with ∀Ci : Ci ∩ S2 = ∅ then
8:
D ← ({d}, { }, ∅, {hd,
√1i})
9:
T2 ← Linking(D, d, , 1, T2 )
10:
else if C = {Ci }i with Cj ⊆ S1 6= ∅ =
6 Ck ⊆ S2 then
11:
choose s1 ∈ Ck
12:
choose s2 ∈ Cj
13:
FixIt(T1 , s1 , T2 , s2 )
14:
end if
15:
C ←MaxLumping(T1 , T2 )
16:
until C = ∅
. i.e. T1 ∼tb T2
17:
return T1 , T2
18: end procedure
Algorithm 3. Padding Algorithm
T2 = (S2 , A, −→, {hr2 , 1i}) and tries to construct a probabilistic bisimulation
for Tˆ1 and Tˆ2 by lumping Tˆ.
At each step the procedure constructs a partition P = {B1 , B2 , . . . , Bm } of
the set S1 ∪ S2 and identifies a set of critical blocks C = {C1 , . . . Ck } ⊆ P . A
critical block Cj is an equivalence class which has representatives in only one
of the two PTS’s, i.e. such that Cj ∩ S1 = ∅ or Cj ∩ S2 = ∅. The procedure
works on“layers”, i.e. at each iteration MaxLumping(T1 , T2 ) partitions the
nodes in layer n by choosing a splitter among the blocks in the n layer cut-off;
it moves on to the next layer only when all such blocks have been considered.
Padding. The padding procedure identifies the critical blocks by calling the
procedure MaxLumping(T1 , T2 ) and then transforms the two processes T1
and T2 in order to “fix” the transitions to the critical blocks. The transformation is performed by the procedure FixIt which we will describe later. In
the special case where the critical blocks do not contain any representative
of either of the two processes, say T1 , the fixing consists in padding T1 with
a depleted sub-tree formed by its root only which is a dummy state. This
is linked to the root of T1 via a ticked transition with probability 1 by the
Linking procedure. In the general (more symmetric) case where some of the
critical blocks contain only states from one process and some other critical
blocks only states from the other process the transformation is made by the
FixIt procedure.
16
1: procedure FixIt(T1 , s1 , T2 , s2 )
2:
for all Bi blocks do
. determine all possible changes
p1i
p2i
3:
Assume: s1 −→ Bi and s2 −→ Bi
4:
p0i ← min(p1i , p2i )
5:
end for
6:
Assume: Bk is such that p0k = p2k
7:
Assume: Bl is such that p0l = p1l
8:
rk ← p1k − p0k
9:
rl ← p2l − p0l
1
10:
choose t1k ∈ Bk ∩ S1 and T1k sub-tree starting with s1 −→ t1k
1
11:
choose t2k ∈ Bk ∩ S2 and T2k sub-tree starting with s1 −→ t2k
1
12:
choose t1l ∈ Bl ∩ S1 and T1l sub-tree starting with s2 −→ t1l
1
13:
choose t2l ∈ Bl ∩ S2 and T2l sub-tree starting with s2 −→ t2l
14:
. make maximal possible changes of probabilities
q2l
q2k
q1l
q1k
t2l
t2k , and s2 −→
t1l , s2 −→
15:
Assume: s1 −→ t1k , s1 −→
16:
r1k ← min(q1k , rk )
17:
q1k ← (q1k − r1k )
18:
r2l ← min(q2l , rl )
19:
q2l ← (q2l − r2l )
20:
. construct and link up “compensating” subtrees
1
1
21:
Xkl , Ykl ← Padding(T1k , Tˆ1l )
2
22:
Xkl
, Ykl2 ← Padding(Tˆ2k , T2l )
1
)
23:
Joining(T1 , s1 , r1k , Xkl
2
24:
Joining(T2 , s2 , r2l , Ykl )
25: end procedure
Algorithm 4. Fixing Algorithm
When among the critical blocks of the current partition there is at least one
block Ck which contains only representatives of T1 and one block Cj which
contains only representatives of T2 , the fixing is more elaborate and involves
both processes. In this case there must exist states s1 ∈ Ck and s2 ∈ Cj
with different probabilities of reaching some other blocks Bi of the partition.
The FixIt procedure selects two blocks Bk and Bl in previous layer such
that the probabilities p1k , p1l of the transitions from s1 to Bk and Bl and the
probabilities p2k , p2l of the transitions from s2 to Bk and Bl are such that
p1k > p2k and p1l < p2l . This choice is always possible. In fact, the sum of all
P
probabilities from s1 and s2 to other classes sum up to one, i.e. i p1i = 1 =
P
i p2i where i covers all classes Bi reachable from s1 or s2 . As s1 and s2 are in
different (critical) blocks the probabilities must differ for at least one class Bi ,
i.e. there exists at least one index k with p1k 6= p2k ; thus we have that either
p1k > p2k or p1k < p2k . Without loss of generality, let p1k > p2k , i.e. we found
P
P
the class Bk as required. We now have i6=k p1i + p1k = 1 = i6=k p2i + p2k
P
P
and thus i6=k p1i < i6=k p2i . As the summation is over the same number of
probabilities, i.e. positive numbers, there must be at least one summand in
17
P
i6=k
p2i such that p2l > p1l .
After that, FixIt selects four states t1k ∈ Bk ∩ S1 , t2k ∈ Bk ∩ S2 , t1l ∈ Bl ∩ S1 ,
q2l
q2k
q1l
q1k
t2l and
t2k , and s2 −→
t1l , s2 −→
t1k , s1 −→
and t2l ∈ Bl ∩ S2 with s1 −→
the corresponding sub-trees T1k , T1l , T2k and T2l which are rooted in t1k , t1l , t2k
and t2l . Note that since Bk and Bl are in the CutOff(n − 1) they are not
critical blocks, thus it is always possible to find such state. The probabilities
q1l and q2k stay unchanged but q1k and q2l will be decreased by the maximal
possible amount. Ideally, this decrement should be rk and rl , but if q1k and q2l
are smaller than rk and rl we only reduce q1k and q2l to zero, i.e. effectively
we remove the corresponding sub-trees.
The states s1 and s2 now have the same probabilities to reach the classes Bl and
Bk , i.e. have become bisimilar. However, this transformations has two flaws:
(i) the system is no longer a PTS as the probabilities from s1 and s2 no longer
add up to one, (ii) the distribution on the terminal states reachable in T1k
and T2l are reduced. In order to fix this we have to construct “compensating
trees” which are bisimilar and which reproduce the missing probabilities for
the outputs. These two trees are constructed via calling Padding on a copy
of T1k and a depleted version of T1l , and a copy of T2l and a depleted version
of T2k respectively.
Finally, these “compensating trees” are linked to s1 and
√
s2 with a label and the missing probabilities r1k and r2l .
This way we obtain again a proper tree-PTS with normalised probability distributions for s1 and s2 . Furthermore, the“compensating trees” produce exactly the same output as T1k and T2l , thus the resulting PTS’s have the same
overall output as the original ones. Finally, by construction the compensating
trees are bisimilar as T1k and T2k are bisimilar; the same holds for T1l and T2l .
5
Correctness of Padding
In order to show that Padding(T1 , T2 ) indeed produces in PTS’s T10 and T20
which are (i) I/O-equivalent, more precisely weak trace equivalent, to T1 and
T2 respectively and (ii) tick-bisimilar, i.e. T10 ∼tb T20 , we first describe the
effect of various operations performed in Padding on the observables and
(non)bisimilar states.
Given two states s1 and s2 in two tree-subPTS’s T1 and T2 respectively and
a bisimulation relation ∼ on the union T = T1 ⊕ T2 then we define a new
subPTS Tsmin
as a common minimisation of transition probabilities of s1 and
1 ,s2
s2 by changing the transition probabilities from each of the two states to any
class Ck of the bisimulation relation ∼ and all labels α as follows:
18
p0 (s1 , α, Ck ) = min(p(s1 , α, Ck ), p(s2 , α, Ck ))
p0 (s2 , α, Ck ) = min(p(s1 , α, Ck ), p(s2 , α, Ck ))
p0 (si , α, sj ) = p(si , α, sj ) for i 6∈ {1, 2}
. We
where p denotes the probabilities in T and p0 the probabilities in Tsmin
1 ,s2
min
min
denote by T1 and T2 the new subPTS’s corresponding to T1 and T2 respectively.
As we require a change of only the (accumulated) probabilities from each state
to all classes it is possible to achieve this in various ways by adjusting the stateto-state probabilities in a number of ways, i.e. the common minimisation is
not unique.
We also define a new relation ∼0 on Tsmin
as follows si ∼0 sj if si ∼ sj and
1 ,s2
{i, j} ∩ {1, 2} = ∅ and s1 ∼0 s2 and s2 ∼0 s1 .
Lemma 19 Given a state s1 in T1 and a state s2 in T2 and a bisimulation
is a
relation ∼ on T = T1 ⊕ T2 , then the equivalence relation ∼0 on Tsmin
1 ,s2
bisimulation relation in which s1 and s2 are bisimilar.
PROOF. This follows directly from the definition of ∼0 and p0 . 2
Note that ∼0 is not necessarily the coarsest bisimulation relation or lumping
on/of Tsmin
. It could happen that there exists already a class of states which
1 ,s2
are bisimilar to s1 and s2 after minimisation.
Consider two tree-subPTS’s T and S and a state t in T . The insertion of S
into T at t with probability q and label λ as the tree-subPTS T Ctq:α S with
p0 (ti , α, tj ) = p(ti , α, tj ) for ti , tj ∈ T and all labels α
p0 (sk , α, sl ) = p(sk , α, sl ) for si , sj ∈ S and all labels α
p0 (t, λ, s0 ) = q
where s0 is the root state of S and p denotes the probabilities in T and S
respectively, and p0 the probabilities in T Ctq:α S.
Lemma 20 Given two pairs of bisimilar tree-subPTS’s T1 ∼ T2 and S1 ∼ S2
and two bisimilar states t1 and t2 in T1 and T2 respectively, then the common
insertion of S1 and S2 into T1 and T2 at t1 and t2 with probability q and label
λ result in bisimilar subPTS’s, i.e.
1
2
T1 Ctq:λ
S1 ∼ T2 Ctq:λ
S2
19
PROOF. We define the bisimulation relation/lumping of T1 Ctp1 S1 and
T2 Ctp2 S2 as follows: For all states in S1 ⊕ S2 and T1 \ {t1 } ⊕ T2 \ {t2 } we
use the same relation as before. For t1 and t2 we exploit the fact that they are
bisimilar with respect to all classes in T1 ⊕ T2 and with respect to the (potential) new state/class {s01 , s02 } formed by the roots of S1 and S2 respectively
we have by definition the same probability p. 2
Given a tree-subPTS T then we define a root extension of T as a tree-subPTS
0
r Ct1:ω
T where we add an additional state r and define the transition probabilities as:
p0 (ti , α, tj ) = p(ti , α, tj ) for ti , tj ∈ T and all labels α
p0 (r, ω, t0 ) = 1
where t0 is the root state of T and p denotes the probabilities in T and p0 the
0
probabilities in r Ct1:ω
T . This can also be seen as a special kind of insertion
which explains the notation.
Lemma 21 Root extension does not change the I/O observables.
PROOF. All the paths πi from the root t0 to the leaves in T are replaced by
0
paths of the form ωπi from r to the leaves in T 0 = r Ct1:ω
T , but as ωπi ≡ πi
we have the same observables. 2
Let T be a tree-subPTS and S a sub-tree-subPTS of T , then the deletion S in
T is denote by T 6 S, i.e. this is the subPTS on the states of T \S obtained by
restriction of the transition relation to these states. The deletion of a sub-tree
changes the observables as follows.
Lemma 22 Given a tree-subPTS T and S a sub-tree-subPTS of T such that
the observables of T and S are O(T ) and O(S) and the (unique) path from
the root of T to the root of S is π0 then the observables of T 6 S are given by:
O(T 6 S) = O(T ) − π0 · O(S)
with “·” denoting the pre-fixing: π0 · {hπi , pi i}i = {hπ0 πi , pi i}i .
PROOF. By eliminating S we also eliminate the paths through S to leaves.
As the observables are distributions over paths we simply have to eliminate
the contribution of these paths through S from the (complete) observables of
T . The contribution of these paths is not just their “S-stage” but has to be
20
prefixed by the contribution until S, i.e. its root, is reached, this is given by
the unique path p0 . 2
Insertion of sub-trees changes observables as follows.
Lemma 23 Given two tree-subPTS T and S and a state t in T such that
the observables of T and S are O(T ) and O(S) and the (unique) path from
the root of T to t is π0 then the observables of the insertion of S into T with
probability p with action α has the following observables:
O(T Ctp:α S) = O(T ) + p π0 α · O(S)
i.e. we have to add the “p-weighted” paths in π0 α · O(S) to O(T ).
PROOF. Essentially the same argument as before: We obtain the same paths
and thus observables as in T alone, but have to add the contribution of the
paths going through S. The latter have to be additionally be prefixed by the
path from the root of T to t and the entry action α from t to the root of S. 2
We can also describe the change of observables if we only partially delete the
subtree S, if we change the entry probability into S from T , i.e. the probability
with which the root of S can be reached from T \ S, from p to p0 . We denote
the resulting subPTS by T 6pp0 S.
Lemma 24 Given a tree-subPTS T and S a sub-tree-subPTS of T such that
the observables of T and S are O(T ) and O(S) and the (unique) path from
the root of T to the root of S is π0 then the observables of T 6 S are given by:
O(T 6pp0 S) = O(T ) − (1 − p0 /p)(π0 · O(S)).
PROOF. The argument is essentially the same as previously, however each
path reaching a leaf through S is not completely removed but only its impact
on the observables is reduced. In T the probability is a product of the form
Q
Q
k pk · p · l pl where the pk indicate the computational steps in π0 , the probability p is the original entry probability into S, and pl are the probabilities in
Q
Q
S. In T 6pp0 S the probability of this path is changed to: k pk · p0 · l pl . If we
therefore remove the contribution of these paths in the original observables
and add the contribution in the new subPTS we obtain the desired result:
0
0
O(T 6pp0 S) = O(T ) − π0 · O(S) + pp π0 · O(S) = O(T ) − (1 − pp )π0 · O(S). 2
Proposition 25 Given two tree-subPTS’s T1 and T2 such that T1 ∼io T2 and
assume that the n-cutoffs are bisimilar, i.e. T1 |n ∼ T2 |n , then the common
21
minimisation of T1 |n+1 ⊕T2 |n+1 with respect to two non-bisimilar states s1 ∈ T1
and s2 ∈ T2 in layer n + 1 fulfils
(1) T1min |n ∼tb T2min |n
(2) the number of non-bisimilar states in T1 |n+1 ⊕ T2 |n+1 is strictly reduced
(3) the observables change as follows:
O(T10 ) = O(T1 ) − (1 − min(p1 , p2 )/p1 )(π1 · O(s14 ))
O(T20 ) = O(T2 ) − (1 − min(p1 , p2 )/p2 )(π2 · O(s24 ))
where π1 and π2 are the paths from the roots in T1 and T2 to s1 and s2 respectively and s14 and s24 denote the subtrees in T1 and T2 with roots s1 and s2
respectively.
PROOF. The bisimulation class a state s belongs to are determined in a finite
tree-subPTS only by it successors, i.e. s4 . As we only change the transitions
for states s1 and s2 in layer n + 1 nothing changes for the states in the ncutoffs; this shows (1). We do not introduce any new states in layer n + 1
but we force s1 and s2 to be bisimilar, cf Lemma 19. Therefore the number
of non-bisimilar states in layer n + 1 is strictly reduced (by two) and as all
states in the n-cutoffs are bisimilar by assumption we have shown (2). The
observables of T1 and T2 change as in (3) according to Lemma 24. 2
The common minimisation makes two tree-PTS’s more bisimilar — by strictly
reducing the number of non-bisimilar states — but it also changes the observables. We can compensate for this by inserting two bisimilar sub-trees which
compute the missing observables.
Proposition 26 Given two tree-subPTS T1 and T2 such that T1 ∼io T2 , assume that the n-cutoffs are bisimilar, i.e. T1 |n ∼tb T2 |n , and consider the common minimisation of T1 |n+1 ⊕ T2 |n+1 with respect to two non-bisimilar states
s1 ∈ T1 and s2 ∈ T2 in layer n + 1, then the insertion of the padded versions
of the depleted and original version of s14 and s24 gives:
T1 ∼io (T1min Csp11 −min(p1 ,p2 ) Padding(s14 , sd
24 ))
T1min ∼tb (T1min Csp11 −min(p1 ,p2 ) Padding(s14 , sd
24 ))
T2 ∼io (T2min Csp22 −min(p1 ,p2 ) Padding(sd
14 , s24 ))
T2min ∼tb (T2min Csp22 −min(p1 ,p2 ) Padding(sd
14 , s24 ))
22
PROOF. Obviously Timin and (Timin Cspii −min(p1 ,p2 ) Padding(s14 , s24 )), for
i = 1, 2, are bisimilar by Lemma 20; and the same is rue if one of the sub-trees
s14 or s24 is depleted. The observables of Ti where reduced by the common
minimisation according to Lemma 24 and Proposition 25 by reducing the
probabilities of some paths through si4 . Inserting the si4 again with the
missing probability compensates for this; while the depleted versions sd
i4 do
not influence the observables. 2
We can now show the main theorem establishing that the padding is correct.
Theorem 27 Given two tree-PTS’s T1 and T2 then Padding(T1 , T2 ) returns
the tree-PTS’s T10 and T20 such that T10 ∼tb T20 , T10 ∼io T1 , and T20 ∼io T2 .
PROOF. The proof is by induction on the height of T1 ⊕ T2 . The algorithm
Padding progresses through the trees T1 and T2 layer by layer starting from
the terminal states or leaves (base case). The sub-routine MaxLumping tries
to identify non-bisimilar states/classes in T1 and T2 respectively. If a mismatch
is found on layer n + 1 then the n-cutoffs are bisimilar, i.e. T1 |n ∼tb T1 |n
(induction hypothesis).
Calling the sub-routine MaxLumping has three possible outcomes: (i) no
mismatches are found, (ii) there is only one mismatch state/class on level
n + 1, i.e. one of the two trees is higher, or (iii) there are two states/classes
s1 and s2 in T1 and T2 respectively which are not bisimilar. If case (i) holds,
then T1 and T2 are already bisimilar and the algorithm terminates. In case (ii)
we need to extend the shorter tree, i.e. assuming w.l.o.g. that T1 is shorter,
we need to construct a root extension r Ct10 : ωT1 . By Lemma 21 we have
T1 ∼io r Ct10 : ωT1 , and as T1 |n and T2 |n are bisimilar we now also have that
T1 |n+1 ∼tb T2 |n+1 .
In the general case (iii) the transformation proceeds in two phases: In phase one
we construct a common minimisation of T1 and T2 such the n-cutoff are still
bisimilar, and such that the number of non-bisimilar states in the n + 1-cutoff
is strictly reduced, cf. Proposition 25. In phase two we insert subtrees which
compensate for the missing observables (and also guarantee that the normalisation condition of proper PTS’s are fulfilled); as the new subtrees are bisimilar
we do not increase the number of non-bisimilar states, see Proposition 26. In
other words: Phase one strictly reduced the number of non-bisimilar states
while phase two does not increase it; and: Phase one changes the observables
but phase two compensates again. Together we therefore have a monotone
decrease of the number of non-bisimilar states while the observables remain
unchanged.
As we have only finitely many states in a layer we will eventually reach the
23
situation that there are no non-bisimilar states in layer n+1 or more generally
in the n + 1-cutoff. The algorithm therefore terminates in each layer and as we
also have only finitely many layers the algorithm terminates with T10 ∼tb T20 ,
T10 ∼io T1 , and T20 ∼io T2 . 2
6
A Simple Example
Consider two processes A and B (with p 6= q and p, q 6∈ {0, 1}) defined as:
A
•2
~
p:a ~~
~
~
~~~
•1 @@
@@(1−p):b
@@
@
and
B
•6
•3
1:c
~
q:a ~~
~
~
~~~
•5 @@
@@(1−q):b
@@
@
•7
•4
1:c
•8
The aim is to construct two processes A0 and B 0 such that A0 ∼tb B 0 and A ∼io
A0 and B ∼io B 0 . Note that A and B do not have the same I/O observables, i.e.
A 6∼io B, and that we also do not require this for the transformed processes.
If we call MaxLumping(A, B) we have as first splitter the set of all states, i.e.
S1 = {1, 2, . . . , 8}. The backward image of S1 , i.e. calling Splitting(S1 , {S1 }),
allows us to distinguish the terminal nodes from ‘internal’ nodes, i.e. we get
the partition K1 = {{2, 4, 6, 8}, {1, 3, 5, 7}}. Obviously, both processes have
representatives in each of the two blocks of this partition.
If we continue lumping using the splitter S2 = {2, 4, 6, 8}, i.e. the terminal nodes, we get a refined partition K2 = {{2, 4, 6, 8}, {3, 7}, {1}, {5}}. The
probability of reaching S2 from nodes 3 and 7 is the same (namely 1) while 1
and 5 reach the terminal nodes with different probabilities (namely p and q)
and thus form two separate blocks.
The splitting K2 now is not properly balanced, i.e. the class {1} has no representatives in B and {5} has no representatives in A. MaxLumping(A, B)
will thus report {1} and {5} as critical classes.
In order to fix the ‘imbalance’ between A and B we have to do two things:
(i) minimising/unifying the probabilities of transitions emitting from 1 and 5,
and (ii) introducing subtrees X and Y to compensate for the missing outputs
24
and to re-establish normalised probabilities.
• The first step thus is to transform A and B into the following form:
~
min(p,q):a ~~
~
~
~~~
•2
and
•1 @@
@@min(1−p,1−q):b
@@
@
•3
~
min(p,q):a ~~
~
~
~~~
•6
1:c
•5 @@
@@min(1−p,1−q):b
@@
@
•7
•4
1:c
•8
These two transformed versions of A and B are now probabilistically tick
bisimilar but unfortunately they have different outputs, i.e. they are not
I/O equivalent; moreover they fail to fulfil the normalisation condition for
PTS and Markov chains.
• To compensate for the “missing” probability r = 1 − min(p, q) − min(1 −
p, 1−q) we have to construct subtrees X and Y which reproduce the missing
output states and which are bisimilar. As we have a critical class, namely
Ck = {1} ∈ A and Cj = {5} ∈ B, in each process we have to consider the
third case (line 10) in Padding. As the critical classes are singletons we
have to take s1 = 1 and s2 = 5 in when we call FixIt(A, 1, B, 5).
There are now for each of the states s1 = 1 and s2 = 5 two blocks they can
make a move to, namely Bk = {3, 7} and Bl = {2, 4, 6, 8}, but with different
probabilities. We have now to chose in FixIt (lines 10 to 13) representative
states for these two blocks in each of the two processes A and B. In our
example these states are uniquely determined, t1k = 3, t1l = 2, t2k = 7, and
t2l = 6. Furthermore, we have to consider the following sub-trees of A and
B which are rooted in these representative states:
T1l = ◦
•
1:a
2
T1k = ◦
T2l = ◦
1:b
3
•
•
1:c
T2k = ◦
1:a
6
1:b
7
•
•4
1:c
•8
We assume w.l.o.g. p > q, i.e. min(p, q) = q and min(1 − p, 1 − q) = 1 − p
(the other case is symmetric) and thus r1k = r2l = p − q. The compensation
1
1
tree Xkl
for A thus must fulfil Xkl
∼io T1l and for Ykl2 in B we have Ykl2 ∼io T2k
1
1
and furthermore Xkl
∼tb Ykl2 . In order to construct Xkl
and Ykl2 we thus just
have to call (recursively) Padding(T1k , Tˆ1l ) and Padding(Tˆ2k , T2l ).
We therefore have to compare twice two trees in order to obtain a padded
25
version (calling Padding recursively):
and ◦
◦
1:a
20
•
√
1
give Xkl
= ◦
1:
•
•
√
1:
•4
30
0
√
1:
30
◦
•
√
and ◦ give Ykl2 = ◦
60
70
0
1:c
•8
1:b
•7
•
1:a
•4
1:b
1:
0
0
1:c
•8
0
A dotted transition denotes a ticked or depleted transition: it means that
no computation takes place during this transition (i.e. the state does not
change) only time passes. Empty nodes ◦ denote copies of the dummy state:
they can be seen as place-holders that inherit the state from the previous
1
and Ykl2 fulfil the needed requirenode. Obviously these two sub-trees Xkl
ments regarding bisimilarity and I/O equivalence.
• Finally we have to introduce these compensatory sub-trees into the transformed versions of A and B to obtain (with r = 1 − min(p, q) − min(1 −
p, 1 − q)):
A0
and
1
• A
} A√
AAmin(1−p,1−q):b
min(p,q):a }}
}
r: AAA
}}
A
}
~}
0
•2
1:a
•4
7
7.1
0
5
•
} AAA min(1−p,1−q):b
min(p,q):a }}
A
}
r:b AAA
}}
A
}
~}
0
•6
•3
•3
B0
1:c
•4
•7
•7
1:c
•8
0
1:c
•8
Comparisons and Related Work
Comparison with Agat’s transformation
In [2] Agat proposes a transformation technique in the context of a type system
with security types. The type system, besides detecting illegal information
flow, transforms conditional statements that branch on high-security data into
new statements where both conditional branches have been made bisimilar,
thus making the new statement immune to timing attacks and the associated
high-security data safe. The conditional statement is transformed by adding
a ‘dummy copy’ of each branch to its counterpart such that, when the new
branches are compared against each other, the original code is matched against
its dummy copy. If we adapt this technique to PTS’s and apply it to the
processes A and B considered before, we get the transformed processes shown
26
below:
√
•2
~
p:a ~~
~
~
~~~
◦

√
√
1:c
•4
◦
~
•6
√
q:
(1−q):
◦
◦
√
•5
~
}
!
•6
0
}}
~}}
0
(1−q):b
•7
1:
◦
√
1:
•5
}
q:a }}}
1:c
•8
◦
•7
√
(1−p):
~~ (1−q):b
~~
~
~
q:a
1:
◦
√
p:
•3
(1−q):
◦
@@(1−p):b
@@
@
√
q:
and
•1 @@
0
1:c
•8
0
According to Agat’s transformation, a ’dummy execution’ of B should follow
A and a ’dummy execution’ of A should precede B. Given that A has two
final states, two dummy copies of B are appended to process A. Similarly,
a dummy copy of A is put before B. However, in order to preserve the tree
structure of the process, a new copy of B is made so it can be joined to one
of the final states of the dummy copy of A.
Clearly the two transformed trees are bisimilar and the new processes are I/Oequivalent to their original versions. It is also clear that this transformation
not only creates a greater number of states than our proposed solution but,
more importantly, generates processes with longer execution time.
7.2
Related Literature
The work by Agat [2] is certainly the most closely related to ours; in fact we
are not aware of other approaches to closing timing leaks which exploit program transformation techniques. Apart from the basic differences between our
transformation algorithm and the Agat transformation type system explained
in the previous section, our approach is in some sense more abstract than [2] in
that it does not address a specific language; rather it is in principle adaptable
to any probabilistic language whose semantics can be expressed in terms of
our PTS model.
Timing attacks are prominently studied in security protocol analysis, and in
general in the analysis of distributed programs specifically designed to achieve
secure communication over inherently insecure shared media such as the Internet. In this setting, timing channels represent a serious threat as shown for
example in [18] where a timing attack is mounted on RSA encryption, and in
27
[19] where it is shown how by measuring the time required by certain operation a web site can learn information about the user’s past activities; or in
[20] where an attack is described over Abadi’s private authentication protocol
[21].
The various approaches which have been proposed in the literature for the time
analysis of security and cryptographic protocols mainly exploit languages and
tools based on formal methods (timed automata, model checkers and process algebra). For example, [22] develop formal models and a timed automata
based analysis for the Felten and Schneider’s web privacy attack mentioned
above. A process algebraic approach is adopted in [13] and [23] where the
problem of timing information leakage is modelled as non-interference property capturing some time-dependent information flows. A different approach
is the one adopted in [24] where static analysis techniques are used to verify
communication protocols against timing attacks.
8
Conclusion and Further Work
In this paper we have investigated possible countermeasures against timing attacks in a process algebraic setting. The aim is to transform systems such that
their timing behaviour becomes indistinguishable – i.e. they become bisimilar – while preserving their computational content – i.e. I/O behaviour. Our
particular focus in this work was on finite, tree-like, probabilistic systems and
probabilistic bisimulation.
Simple obfuscation methods have been consider before, e.g. by Agat [2], which
may lead to a substantial increase of the (average) running time of the transformed processes. Our approach attempts to minimise this additional running time overhead introduced by the transformation. The padding algorithm
achieves this aim by patching time leaks ‘on demand’ whenever the lumping
algorithm which aims to establish the bisimilarity of two processes encounters a critical situation. Further work will be directed towards improving the
padding algorithm further. At various stages of the algorithm we make nondeterministic choices (of certain states or classes); it will be important to
investigate strategies to achieve optimal choices.
We also plan to investigate probabilistic and conditional padding. The current
algorithm introduces time leak fixes whenever they appear to be necessary. The
resulting processes are therefore perfectly bisimilar. However, one could also
think of applying patches randomly or only if the expected time leak exceeds a
certain threshold. One would then obtain approximate or ε-bisimilar systems
– in the sense of our previous work [7,1] – but could expect a smaller increase
of the (average) running time and/or the size of the system or program con28
sidered. Optimising the trade-off between vulnerability against timing attacks
– measured quantitatively by ε – and the additional costs – running time, system size etc. – requires a better understanding of the relation between these
different factors.
References
[1] A. Di Pierro, C. Hankin, H. Wiklicky, Measuring the confinement of
probabilistic systems, Theoretical Computer Science 340 (1) (2005) 3–56.
[2] J. Agat, Transforming out timing leaks, in: POPL ’00: Proceedings of the
27th ACM SIGPLAN-SIGACT Symposium on Principles of Programming
Languages, ACM Press, New York, NY, USA, 2000, pp. 40–53.
[3] R. van Glabbeek, S. Smolka, B. Steffen, Reactive, generative and stratified
models of probabilistic processes, Information and Computation 121 (1995)
59–80.
[4] R. Segala, N. Lynch, Probabilistic simulations for probabilistic processes, in:
Proceedings of CONCUR 94, Vol. 836 of Lecture Notes in Computer Science,
Springer Verlag, 1994, pp. 481–496.
[5] J. G. Kemeny, J. L. Snell, Finite Markov Chains, D. Van Nostrand Company,
1960.
[6] B. Jonsson, W. Yi, K. Larsen, Probabilistic Extentions of Process Algebras,
Elsevier Science, Amsterdam, 2001, Ch. 11, pp. 685–710, see [25].
[7] A. Di Pierro, C. Hankin, H. Wiklicky, Quantitative relations and approximate
process equivalences, in: D. Lugiez (Ed.), Proceedings of CONCUR’03, Vol. 2761
of Lecture Notes in Computer Science, Springer Verlag, 2003, pp. 508–522.
[8] B. Lampson, A Note on the Confinement Problem, Communications of the
ACM 16 (10) (1973) 613–615.
[9] J. Goguen, J. Meseguer, Security Policies and Security Models, in: IEEE
Symposium on Security and Privacy, IEEE Computer Society Press, 1982, pp.
11–20.
[10] J. Gray, III, Towards a mathematical foundation for information flow security,
in: Proceedings of the 1991 Symposium on Research in Security and Privacy,
IEEE, Oakland, California, 1991, pp. 21–34.
[11] P. Ryan, S. Schneider, Process algebra and non-interference, Journal of
Computer Security 9 (1/2) (2001) 75–103, special Issue on CSFW-12.
[12] K. Larsen, A. Skou, Bisimulation through probabilistic testing, Information and
Computation 94 (1991) 1–28.
29
[13] R. Focardi, R. Gorrieri, F. Martinelli, Real-time information flow analysis, IEEE
Journal on Selected Areas in Communications 21 (1) (2003) 20–35.
[14] S. Derisavi, H. Hermanns, W. H. Sanders, Optimal state-space lumping in
Markov chains, Information Processing Letters 87 (6) (2003) 309–315.
[15] R. Focardi, R. Gorrieri, Classification of security properties (Part I: Information
Flow), in: Foundations of Security Analysis and Design, Tutorial Lectures, Vol.
2171 of Lecture Notes in Computer Science, Springer Verlag, 2001, pp. 331–396.
[16] R. Paige, R. Tarjan, Three partition refinement algorithms, SIAM Journal of
Computation 16 (6) (1987) 973–989.
[17] S. Derisavi, H. Hermanns, W. H. Sanders, Optimal state-space lumping in
Markov chains, Information Processing Letters 87 (6) (2003) 309–315.
[18] P. Kocher, Cryptanalysis of Diffie-Hellman, RSA, DSS, and other cryptosystems
using timing attacks, in: D. Coppersmith (Ed.), Advances in Cryptology,
CRYPTO ’95, Vol. 963 of Lecture Notes in Computer Science, Springer-Verlag,
1995, pp. 171–183.
[19] E. W. Felten, M. A. Schneider, Timing attacks on web privacy, in: Proceedings
of CCS’00, ACM, Athens, Greece, 2000, pp. 25–32.
[20] R. J. Corin, S. Etalle, P. H. Hartel, A. Mader, Timed analysis of security
protocols, Journal of Computer Security.
[21] M. Abadi, Private authentication, in: R. Dingledine, P. F. Syverson (Eds.), 2nd
Int. Workshop on Privacy Enhancing Technologies, Vol. 2482 of Lecture Notes
in Computer Science, Springer Verlag, 2002, pp. 27–40.
[22] R. Gorrieri, R. Lanotte, A. Maggiolo-Schettini, F. Martinelli, S. Tini, E. Tronci,
Automated analysis of timed security: A case study on web privacy, Journal of
Information Security 2 (2004) 168–186.
[23] R. Gorrieri, F. Martinelli, A simple framework for real-time cryptographic
protocol analysis with composition proof rules, Science of Computer
Programming 50 (2004) 23–49.
[24] M. Buchholtz, S. Gilmore, J. Hillston, F. Nielson, Securing staticallyverified communications protocols against timing attacks, in: 1st International
Workshop on Practical Applications of Stochastic Modelling (PASM 2004), Vol.
128, 2005, pp. 123–143.
[25] J. Bergstra, A. Ponse, S. Smolka (Eds.), Handbook of Process Algebra, Elsevier
Science, Amsterdam, 2001.
30