Tempus Fugit: How To Plug It Alessandra Di Pierro 1 Dipartimento di Informatica, Universit´ a di Pisa, Italy Chris Hankin 1 Igor Siveroni 1 Herbert Wiklicky 1 Department of Computing, Imperial College London, United Kingdom Abstract Secret or private information may be leaked to an external attacker through the timing behaviour of the system running the untrusted code. After introducing a formalisation of this situation in terms of a confinement property, we present an algorithm which is able to transform the system into one that is computationally equivalent to the given system but free of timing leaks. Key words: Security Analysis, Timing Leaks, Program Transformation, Probabilistic Processes 1 Introduction This paper concerns the transformation of programs to remove covert channels. Our focus is on timing channels, which are able to signal information through the time at which an action occurs (a command is executed). A system is vulnerable to a timing attack if there are alternative control flow paths of observably different duration; a common refinement of this definition would be just to focus on those alternative paths where the selection is guarded by confidential information. We will show that the comparison of different paths is essentially a confinement problem. We have studied confinement in [1] using probabilistic bisimulation. Specifically, we showed that two processes are probabilistically confined if and only if they share a common (probabilistic) abstract interpretation (Proposition 39 of [1]). Our approach to probabilistic abstract interpretation is based on Operator Algebra and thus 1 All authors are partly funded by the EPSRC project S77066A. Preprint submitted to Elsevier Science 3 May 2006 abstractions are measurable by a norm; in particular, in op. cit., we showed how to measure the difference between two processes and thus introduced the notion of approximate confinement. In op. cit. this measure, , was given a statistical interpretation which indicated how much work an attacker would have to perform in order to differentiate the processes. In this paper we use our previous work to identify the need for transformation. We identify the alternative paths which fail to be bisimilar. These are then candidates for transformation. We present a transformation which establishes probabilistic bisimilarity whilst retaining input/output equivalence with the original system. This transformation involves the introduction of new “padded” control paths. This is similar to the approach of Agat [2]. However, in contrast to Agat, our work is not programming language specific; instead it is based on probabilistic transition systems which provides a very general setting. The transformed system produced by our approach is more efficient than that which would be produced by Agat’s algorithm. This is because our algorithm introduces less padding steps. In the next section, we review some background material on probabilistic transition systems and their representation as linear operators. Section 3 reviews the pertinent results from [1] and shows how to use these results to identify potential timing leaks. Section 4 and Section 5 present our algorithm for padding and its correctness. The remaining sections present a series of examples and comparisons to related work. 2 Probabilistic Processes Probabilistic process algebras extend the well-established classical techniques for modelling and reasoning about concurrent processes with the ability of dealing with non-functional aspects of process behaviour such as performance and reliability, or in general the quantitative counterpart of classical properties representing the functional behaviour of a system. Several models for probabilistic processes have been proposed in the literature in recent years, which differ from each other essentially in the ”amount” of nondeterminism that is allowed in the model. Replacing nondeterministic branching by a probabilistic one leads to either the reactive or the generative model introduced in [3], where the probability distribution which guides the choice may depend (reactive) or not (generative) on the action labelling the branches (or transitions). Other models allow for both nondeterministic and probabilistic branching and lead to a myriad of variants depending on the interplay between the two kinds of branching (cf. [4]). 2 In order to describe the operational behaviour of probabilistic processes we consider probabilistic transition systems (PTS’s), that is probabilistic extensions of the notion of labelled transition systems for classical process algebras. Probabilistic transition systems essentially consist of a set of states and a set of labels, and their behaviour can be informally described as follows: To a given action of the environment the system responds by either refusing it or making a transition to a new state according to a probability distribution. In this paper we consider only PTS’s with a finite number of states and actions, which reflect the generative model of probability; they effectively correspond to a particular kind of probabilistic processes, namely finite Markov chains [5]. 2.1 Probabilistic Transition Systems Formally, we can define a probabilistic transition system in all generality as in [6]. We denote by Dist(X) the set of all probability distributions on the set P X, that is of all functions π : X → [0, 1], such that x∈X π(x) = 1. Definition 1 A probabilistic transition system is a tuple (S, A, −→, π0 ), where: • • • • S is a non-empty, finite set of states, A is a non-empty, finite set of actions, −→ ⊆ S × A × Dist(S) is a transition relation, and π0 ∈ Dist(S) is an initial distribution on S. α For s ∈ S, α ∈ A and π ∈ Dist(S) we write s −→ π for (s, α, π) ∈ −→. p:α By s −→ t we denote the transition to individual states t with probability p = π(t). We denote the transition probability from a state s1 to a state s2 with label α by p(s1 , α, s2 ). For a set of states C we write the accumulated P transition probability from s1 to C as: p(s1 , α, C) = s∈C p(s1 , α, s). In the generative model the transition relation of a PTS (S, A, −→, π0 ) is a partial function −→: S ,→ Dist(S × A); this means that the same probability distribution is used to govern both the choice of the action and the (internal) state transition. Looking at a PTS as a Markov chain, we can identify it with the tree of the outcomes of the associated stochastic process; this can be seen as an experiment which takes place in stages. At each stage the probability for each possible outcome is known when the outcomes of all previous stages are known, and the number of the possible outcomes at each stage is finite. We can define the tree associated to a PTS inductively as follows. Definition 2 Let S is be a non-empty, finite set of states, and let A be a non-empty finite set of actions. Then • ({s}, A, ∅, {hs, 1i}) is a tree-PTS with root s ∈ S; 3 • if Ti = (Si , A, −→i , {hsi , 1i}), i = 1, . . . , m, m < ∞ are tree-PTS’s with roots si ∈ Si , Si ⊆ S for all i = 1, . . . , m, Si ∩ Sj = ∅ for i 6= j, s ∈ S S \ i Si , and {h(si , αi ), pi i | si ∈ Si , αi ∈ A, i = 1, . . . , m} is a probability S S pi :αi distribution, then the PTS constructed as ({s}∪ i Si , A, {s −→ si }∪ −→i , {hs, 1i}) is a tree-PTS with root s. We call Ti the sub-trees of T . When the transition probabilities form a sub-probability distribution, that is P a function π : S × A → [0, 1] with s∈S π(x) ≤ 1, then we call the PTS a subPTS (tree-subPTS). 2.2 Linear Representation of Probabilistic Transition Systems As in previous work – e.g. [7,1] – we recast the common relational presentation of PTS’s in terms of linear maps and operators in order to provide an appropriate mathematical framework for the quantitative study of probabilistic processes. Definition 3 Given a (sub)PTS T = (S, A, −→, π0 ) and a fixed α ∈ A then α the matrix representing the probabilistic transition relation −→ ⊆ S ×[0, 1]×S corresponding to α is given by p:α p iff s −→ t α )st = (M−→ 0 otherwise where s, t ∈ S, and (M)st denotes the entry in column s and row t in the matrix M. We define the matrix representation M(T ) of the (sub)PTS P as α the direct sum of the matrix representations of the transition relations −→ for each α ∈ A: M(T ) = M α . M−→ α∈A We recall that for a set {Mi }ki=1 of ni × mi matrices, the direct sum of these P P matrices is defined by the ( ki=1 ni ) × ( ki=1 mi ) “diagonal” matrix: M= M Mi = diag(M1 , M2 , . . . , Mk ) i Given k square, n × n matrices Mi and an m × n matrix K then we use the L L L short hand notation K( ki=1 Mi ) = ( ki=1 K)( ki=1 Mi ), i.e. we apply K to each of the factors Mi . 4 For probabilistic (sub-probabilistic) relations we obtain a so-called stochastic (sub-stochastic) matrix, i.e. a positive matrix where the entries in each row sum up to one (are less than or equal to one). Matrices are not just schemes for writing down probabilities but also a way to specify linear maps or operators between/on vector spaces. In our case, PTS’s are represented as operators on the vector space V(S) = { (vs )s∈S | vx ∈ R}, i.e. the set of formal linear combinations of elements in S with real coefficients. This space contains all distributions of the state space Dist(S) ⊂ V(S). 3 Process Equivalences and Confinement The problem of preventing a system from leaking private information to unauthorised users is essentially the problem of guaranteeing the confinement [8] of the system against some specific attacks. Typically, after the introduction of the notion of noninterference [9,10], most of the work on confinement has been based on models which ultimately depend on some notion of process equivalence by identifying the absence of information flow between two processes via the indistinguishability of their behaviours [11]. In [1] it is shown how various process equivalences can be approximated, and how probabilities can be used as numerical information for quantifying such an approximation. This provides us with a quantitative measure of the indistinguishability of the process behaviour (according to the given semantics), that is in a security setting a measure of their propensity to leak information. In particular, a notion of confinement can be obtained by stating that two processes p and q are probabilistically confined iff they are probabilistic bisimilar. The central aim of this paper is to present a transformation algorithm for PTS’s which constructs bisimilar PTS’s while preserving the computational effects of the original systems. This section is devoted to a formal definition of the two notions of process equivalence for PTS’s at the base of our treatment, namely probabilistic bisimulation and probabilistic observables. 3.1 Probabilistic Bisimulation Probabilistic bisimilarity is the standard generalisation of process bisimilarity to probabilistic transition systems and is due to [12, Def 4]. Definition 4 A probabilistic bisimulation is an equivalence relation ∼b on 5 the states of a probabilistic transition system satisfying for all α ∈ A: α α p ∼b q and p −→ π ⇒ q −→ % and π ∼b %. Note that in the case of generative systems considered in this paper, π and % are sub-probability distributions. The notion of probabilistic bisimulation on PTS corresponds to the notion of lumped process for finite Markov chains [5]. This can be seen by noting that a probabilistic bisimulation equivalence ∼ on a PTS T = (S, A, →, π0 ) defines a probabilistic abstract interpretation of T [1]. In fact, we have shown that it is always possible to define a linear operator K from the space V(S) to the space V(S/∼ ) which represents ∼. This so-called classification operator is represented by the n × m-matrix: 1 if si ∈ Cj (K)ij = 0 otherwise with S/∼ = {C1 , C2 , . . . Cm } and n the cardinality of S. If M(T ) is the operator representation of T then K† M(T )K is the abstract operator induced by K where K† is the so-called Moore-Penrose pseudo-inverse of K. For classification operators K we can construct K† as the row-normalised transpose of K. Intuitively, K† M(T )K is an operator which abstracts the original system T by encoding only the transitions between equivalence classes instead of the ones between single states. K determines a partition on the state space of T which is exactly the lumping of the states of T . This can be formally deduced from the following theorem in [5]. Theorem 5 A necessary and sufficient condition for a Markov chain to be lumpable with respect to a partition {C1 , C2 , . . . Cm } is that for every pair of P sets Ci and Cj , the probability pkCj = t∈Cj pit of moving in one step from state sk into set Cj have the same value for every sk ∈ Ci . These common values {˜ pij } form the transition matrix for the lumped chain. We can re-formulate the probabilistic bisimilarity of two processes A and B in terms of linear operators as follows, cf [1]: Proposition 6 Given the operator representation M(A) and M(B) of two probabilistic processes A and B, then A and B are probabilistic bisimilar, that is A ∼b B, iff there exist classification matrices KA and KB such that K†A M(A)KA = K†B M(B)KB . 6 Probabilistic bisimilarity of A and B can be expressed in terms of Markov chains lumpability as follows: Proposition 7 Given the operator representation M(A) and M(B) of two probabilistic processes A and B, then A and B are probabilistic bisimilar iff there exists a lumping K of the process M(A) ⊕ M(B) of the form KA K= KB for some classification operators KA and KB for A and B such that KA and KB are fully column-ranked. For tree-PTS the above result essentially means that an equivalence relation on the union of the states of two processes A and B is a probabilistic bisimulation for A and B iff their roots belong to the same equivalence class. Probabilistic bisimulation is the finest of all process equivalences as it identifies processes whose step-wise behaviour is exactly the same. From an external viewpoint other behaviours can be observed in order to distinguish two processes. In particular, a potential attacker may be able to obtain experimentally two kinds of information about a process by observing what final outcomes the process produces, and by measuring how long the process is running. Both the input/output behaviour and the timing behaviour of a process correspond to some suitable abstractions of the concrete behaviour of the process in question. In this paper we will concentrate on these observables and show how we can transform a process (PTS) in a way that its input/output behaviour is preserved, while its timing behaviour changes so as to become indistinguishable from another given process. 3.2 Probabilistic Observables In our PTS model, any execution of a process leads to a single trace, i.e. a single execution path. Each time there is a probabilistic choice, one of the possible branches is chosen with some probability. The combined effect of all the choices made defines the probability of a particular path as the product of probabilities along this computational path. For finite processes we only have to consider finitely many paths of finite length and thus only need a basic probability model. This is based on a finite set of events Ω, i.e. all possible traces, such that all sets of traces have a well-defined probability attached, i.e. are measurable, and the probability function P : P(Ω) → [0, 1] can be defined P via a distribution, i.e. a function π : Ω → [0, 1] such that P(X) = x∈X π(x) 7 for all X ⊆ Ω. This avoids the introduction of more complicated measuretheoretic structures on the set of possible paths. The notion of observables we will consider is based on a trace semantics for probabilistic processes whose intuition and formal definition are given below. 3.2.1 Trace Semantics Formally, the trace semantics of a process T is defined via the notion of probabilistic result of a computational path. p :α p :α pn :αn 1 1 2 2 Definition 8 For a computational path π = s0 −→ s1 −→ . . . −→ sn , we define its probabilistic result R(π) as the pair hα1 α2 . . . αn , prob(π)i, where Q prob(π) = i pi . The trace associated to π is t(π) = α1 α2 . . . αn . Since different computational paths may have the same associated trace (cf. Example 9), we need to sum up the probabilities associated to each such computational path in order to appropriately define a probabilistic result. Formally, this can be done by defining for a process T and trace t the set [π]T,t = {π | π a computational path for T and t(π) = t}, which collects all the computational paths with trace t.DThen we define for each E P trace t the computational result R([π]T,t ) as the pair t, π∈[π]T,t prob(π) . We can now define the trace semantics of T as the set S(T ) = {R([π]T,t ) | t = t(π) and π is a computational path for T }. Example 9 Consider the following simple processes with action A = {a, b, e}: • T1 1:a • | @@@ 1 :b || @@2 | @@ || }| | • • 1:b • • 1 :a 2 • @@ 1 | @@2 :e || @@ || | @ |} | • 1 :b 2 • 1:e • •> ~~ >>> 21 :a ~ >> ~ >> ~~ ~ ~ T4 1:a • 1:b • T3 1 :a 2 1 :b 2 • •> ~~ >>> 21 :a ~ >> ~ >> ~~ ~ ~ T2 • 1:b • 1:e • • 1:b • 1:b • It is easy to see that the possible traces for the first two processes are given by the sequence ab while for process T3 we have the traces abe and aeb and for process T4 we have aeb and ab. We can thus describe the trace semantics of these four processes by: 8 S(T1 ) = {hab, 1i} S(T2 ) = {hab, 1i} S(T3 ) = {habe, 0.5i , haeb, 0.5i} S(T4 ) = {haeb, 0.5i , hab, 0.5i} 3.2.2 I/O Observables The notion of input/output observables, which we will refer to as I/O observables from now on, correspond to the most abstract observation of a process behaviour as it just considers the final result of an execution. In our setting, these final results are identified with probability distributions over the set of all possible abstract traces. Abstract traces are defined in terms of a function [[.]] which allows us to avoid specifying a concrete semantics for the labels, thus keeping our approach as general as possible. In fact, in a PTS we can think of labels as some kind of “instructions” which are preformed by the process, e.g. a = “x := x-1”, b = “x := 2*x”, etc. The final result of an execution is thus the accumulated effect of executing these instructions. Depending on the specific application, the function [[.]] will be defined so as to abstract from the appropriate computational details of the concrete traces. For our purposes, we can leave the concrete nature of [[.]] unspecified and only assume that there is a neutral instruction e, i.e. a label which represents something like a skip statement, such that for any trace t1 t2 . . . tn we have: [[t1 t2 . . . tn ]] ≡ [[t1 t2 . . . ti eti+1 tn ]] i.e. putting e anywhere in the execution (including at the end or the beginning) does not change the computational result. We will identify traces if the only difference between them are some occurrences of this neutral label e; this introduces an equivalence relation ≡ on the set of all traces. Clearly, different concrete traces might be identified by [[.]]; for example, if a = “x := x+1” and b = “x := x-1” one would usually have [[ab]] = [[e]] = [[ba]]. Definition 10 Given a tree-PTS T , we define its I/O observables as O(T ) = {R([π]T,[[t]] ) | t = t(π) and π is a computational path for T }. Proposition 11 For any tree-PTS T , its I/O observables O(T ) define a probability distribution on the set of states S. 9 PROOF. By induction on the tree structure of T . • If T = ({s}, A, ∅, {hs, 1i}) then O(T ) is obviously a probability distribution on S. S S pi :αi • If T = ({s} ∪ i Si , A, {s −→ si } ∪ −→i , {hs, 1i}), then by the inductive hypothesis we have that for all i = 1, . . . , m that O(Ti ) is a probability P distribution on Si . Therefore, O(T ) = m i=1 pi · O(Ti ) is a probability distriS bution on i Si . 2 Based on this notion of I/O observables we can now define when two processes are I/O equivalent. Definition 12 Two PTS’s A and B are I/O equivalent, denoted by A ∼io B, if they define the same probability distribution on the set of the equivalence classes of abstract traces, i.e. iff O(A) = O(B). Example 13 If we consider again the four processes from Example 9, we get the following I/O observables: O(T1 ) = {h[[ab]], 1i} O(T2 ) = {h[[ab]], 1i} O(T3 ) = {h[[abe]], 0.5i , h[[aeb]], 0.5i} = {h[[ab]], 1i} O(T4 ) = {h[[aeb]], 0.5i , h[[ab]], 0.5i} = {h[[ab]], 1i} i.e. all four processes are I/O equivalent. 3.3 Timing Behaviour An external observer may have the ability to measure the time a process needs to perform a given task. In order to do so she just needs to observe “execution times”, while the concrete nature of the performed instruction is irrelevant for her purposes. In our model, the timing behaviour of a process can therefore be defined by abstracting from labels (instructions) and assuming that there is a function |.| from labels to real numbers which measures the time it takes to execute the instruction represented by the label. One can think of various choices for the timing function. For example, one can assume (as in [13]) that there is only one special label t with |t| = 1 which indicates time progress while all other labels take no execution time; in this 10 case the time flow is measured by the number of these special labels occurring in a trace. Alternatively, one can assign a different duration to each label. In our model we assume that all labels need a strictly positive time to be executed, i.e. |α| > 0 for all labels, and that the execution time of all labels is the same. In particular, while in general one could have various neutral labels e with different executions time, e.g. |e0 | = 0, |e1√ | = 1, . . . , we will assume that there is only one neutral element, denoted by , which consumes exactly one time unit. Formally we define the time semantics of our processes in terms of the notion of time bisimulation, that is √ a bisimulation relation on PTS’s where transitions are labelled only by action . Definition 14 Given a PTS T = (S, A, −→, π0 ), we define its√depleted or ticked version Tˆ, as the PTS (S 0 , A0 , −→0 , π00 ) with S 0 = S, A0 = { }, π00 = π0 , and −→0 defined by (s, √ , π) ∈−→0 iff (s, α, π) ∈−→ for some α ∈ A. Time bisimilarity is defined as a probabilistic bisimilarity (cf. Definition 4) for depleted versions of PTS. Definition 15 Given two PTS’s A and B, we say that A and B are time bisimilar, denoted by A ∼tb B, iff there is a probabilistic bisimulation on the states of the ticked versions of A and B. 3.4 Timing Leaks Our model for timing leaks essentially follows the noninterference based approach to confinement recalled at the beginning of this section. In particular, we define timing attacks in terms of the ability of distinguishing the timing behaviour of two processes expressed via the notion of time bisimilarity. In [1] we have used the same idea to model probabilistic covert channels. The notion of process equivalence used in that paper is an approximate version of probabilistic bisimilarity as the purpose there is to provide an an estimate of the confinement of the system, rather than just checking whether it is confined or not. We aim to extending this current work toward a similar direction (cf. Section 8). To this purpose, the linear representation of PTS, i.e. their correspondence with processes which are Markov chains, is essential; in fact, as shown in [1], it provides the necessary quantitative framework for defining numerical estimates. 11 Definition 16 We say that two processes T1 and T2 are confined against timing attacks or time confined iff their associated PTS are time bisimilar. We can re-formulate this definition of timing leaks in terms of lumping as done in Proposition 6 for probabilistic bisimilarity. This will be the base for the definition of the transformation technique we will present in Section 4. L Definition 17 Given the operator representation M(T ) = α∈A Mα (T ) of a probabilistic process T then the linear representation of its ticked version is given by: c )= M(T X Mα (T ) α∈A Definition 18 We say that A and B are time bisimilar, A ∼tb B, iff there exist classification matrices KA and KB , such that † c c K†A M(A)K A = KB M(B)KB . In the following we present a technique for closing possible covert timing channels by transforming the system into an equivalent one which is free from timing leaks. The transformation algorithm we present is applicable to finite tree-PTS’s. It uses ideas from the optimal lumping algorithm in [14] and the padding techniques from [2]. Intuitively, given a system formed by processes A and B, the algorithm is based on the construction of a lumping for the Markov chain formed by the direct sum of the two Markov chains representing the ticked versions of A and B. By Proposition 7, the resulting lumped process will be a probabilistic bisimulation for A and B iff the root nodes of A and B belong to the same class (or they are essentially the same state in the lumped process). In this case no transformation is needed as the two processes are confined against timing attacks; otherwise the algorithm produces two new processes A0 and B 0 which exhibit the same I/O behaviour as A and B but are also time confined. The following diagram shows the effect of the transformation. 6∼tb A ∼io B ∼io A0 ∼tb 12 B0 4 Transforming Probabilistic Transition Systems Checking whether two systems are bisimilar or not, aka the bisimulation problem in the literature, is important not only in concurrency theory but also in many other fields such as verification and model checking, where bisimulation is used to minimise the states space, but also in computer security where many non-interference properties can be specified in terms of bisimulation equivalence [15,1]. Various algorithms have been proposed which solve this problem in linear time, the most important one being the algorithm proposed by Paige and Tarjan in [16]. This algorithm has been adapted in [17] for probabilistic processes; in particular, the Derisavi et al. algorithm constructs the optimal lumping quotient of a finite Markov chain in linear time, and can then be used to check the probabilistic bisimilarity of two PTS’s. In this paper we take a transformational approach: we not only check whether two probabilistic processes are bisimilar or not, but in the case where the processes are not bisimilar then we transform them into two bisimilar ones. The transformation we propose is inspired by the Derisavi et al. algorithm but does not aim to construct an “optimal” partition in the sense of [16]. It rather aims to add missing states and adjust the transition probabilities when it is needed to make two given processes bisimilar. It also preserves the probabilistic observable of the original processes. The algorithm we present consists in essentially two intertwined procedures: one constructs a lumping partition for the PTS union of two tree-PTS’s; the other “fixes” the partition constructed by the lumping procedure if some properties are violated which guarantee the partition to be a probabilistic bisimulation for the given processes. Fixing corresponds to adding some missing states (padding) and adjusting the transition probabilities in order to achieve bisimilarity. Moreover, the transformation is performed in a way that the original I/O semantics (i.e. probabilistic observables) of the processes is preserved. The overall procedure is repeated until a partition is constructed which effectively makes the two processes probabilistically bisimilar. Before embarking on a more detailed description of our algorithm, we introduce some necessary notions and definitions. In the rest of this paper we focus on finite tree-structured PTS. For a PTS (S, A, −→, π0 ) we will consider partitions P = {B1 , . . . , Bn } of S; we will refer to the Bi ’s as the blocks of the partition P . Given a state s ∈ S and a partition P we define the cumulative probability of s wrt B ∈ P as P q p(s, B) = {q | s −→ t with t ∈ B}. We further define the backward image for a block B and a probability q as preq,−→ (B) = {s | ∃B 0 ⊆ B : p(s, B 0 ) = q}; this corresponds to the set of all states from which B (i.e a state in B) can be 13 1: procedure Lumping(T ) 2: Assume: T is a tree-PTS 3: P ← {S} 4: while ¬Stable(P, T ) do 5: choose B ∈ P 6: P ← Splitting(B, P ) 7: end while 8: end procedure Algorithm 1. Lumping Algorithm reached with probability q. We will often omit the index −→ when it is clear from the context. For a tree-PTS T we will refer to Height(T ) as the maximal length of a path in T from the root to a terminal state. The n layer cut-off of a tree-PTS T is defined inductively as (i) the set of terminal nodes for n = 0, and (ii) for n = i + 1 the set of nodes with only direct links (paths of length one) to nodes in the i layer cut-off. The procedure CutOff(T, n) returns the n layer cut-off of T which we also denote by T |n . The complement of the n layer cut-off is sometimes called the n layer top and the top nodes in the n layer cut-off form the n layer. Auxiliary Procedures. An important feature of the transformation of PTS’s we present in the following is the creation of new states. These will be copies of existing states or of a distinguished additional “dummy state” d. We will denote copies of a given state s by s0 , s00 , s000 , . . .. For copies of the dummy state we usually will omit the primes. For a tree-PTS T = (S, A, −→ , {hr, 1i}) we define its copy as the tree-PTS T 0 = (S 0 , A, −→0 , {hr0 , 1i}) with p:α 0 p:α S 0 = {s0 | s ∈ S} and s0 −→ t0 iff s −→ t. In the main algorithm we assume a routine CopyTree(T,s) which returns a copy of the sub-tree of a tree-PTS T which is rooted in state s (where s is a state in T ). We will also use a routine DepleteTree(T, s) which takes a tree-PTS T and returns Tˇ, i.e. a copy of T where all states are√replaced by copies of the dummy state d and all transitions are labelled by . Another auxiliary procedure is Linking(T1 , s, α, p, T2 ) whose effect is to introduce a (copy) of a tree-PTS T1 into an existing tree-PTS T2 at state s; the linking transition will be labelled by α and the associated transition probability will be p. A related procedure called Joining(T1 , s, p, T2 ) introduces a (copy) of a tree-PTS T1 into an existing tree-PTS T2 at state s by identifying the root of T1 with s and by multiplying initial transitions with probability p. 14 1: procedure MaxLumping(T1 , T2 ) 2: Assume: T1 = (S1 , A, −→, {hr1 , 1i}) is a tree structured PTS 3: Assume: T2 = (S2 , A, −→, {hr2 , 1i}) is a tree structured PTS 4: n←0 . Base Case 5: P ← {S1 ∪ S2 } . Partition 6: S ← {S1 ∪ S2 } . Splitters 7: while n ≤ Height(T1 ⊕ T2 ) do 8: T T ← CutOff(T1 ⊕ T2 , n) . n layer cut off 9: P ← {B ∩ T T | B ∈ P } . Partition below current layer 10: S ← P ∪ {(S1 ∪ S2 ) \ T T } . Splitters (below and top) 11: while S 6= ∅ do 12: choose B ∈ S, S ← S \ B . Choose a splitter 13: P ← Splitting(B, P ) . Split (current layer) 14: C ← {B ∈ P | B ∩ S1 = ∅ ∨ B ∩ S2 = ∅} . Critical class(es) 15: if C 6= ∅ then 16: return C . Found critical class(es) 17: end if 18: end while 19: n←n+1 . Go to next level 20: end while 21: end procedure Algorithm 2. Maximal Lumping Algorithm Lumping. As already mentioned, our algorithm is inspired by the PaigeTarjan algorithm for constructing bisimulation equivalences [16], i.e. stable partitions, for a given transition system. The same ideas can be used for p probabilistic transition relations −→; in this case we say that a partition p P = {B1 , . . . , Bn } of a set of states is stable with respect to −→ iff for all blocks Bi and Bj in P and probability q we have that preq (Bi ) ∩ Bj = ∅ or Bj ⊆ preq (Bi ). The construction of the lumping partition is done via the refinement of an initial partition which identifies all states in one class and proceeds as in the algorithm Lumping (cf Algorithm 1). Starting with the initial partition P = {S} the algorithm incrementally refines it by a splitting procedure. For a block Bi the splitting procedure Splitting(Bi , P ) refines the current partition P by replacing each block Bj ∈ P with prep (Bi ) ∩ Bj 6= ∅ for some probability p by the new blocks prep (Bi ) ∩ Bj and Bj \ prep (Bi ). The block Bi is called a splitter. We will also assume the definition of a procedure Stable(P, T ) which returns true iff partition P is stable for PTS T . In order to construct a ticked probabilistic bisimulation ∼tb for two treePTS’s T1 and T2 we apply a slightly modified lumping procedure (cf Algorithm 2) to the ticked version Tˆ of the union PTS T = T1 ⊕ T2 . The procedure MaxLumping(T1 , T2 ) takes two tree-PTS’s T1 = (S1 , A, −→, {hr1 , 1i}) and 15 1: procedure Padding(T1 , T2 ) 2: C ←MaxLumping(T1 , T2 ) . identify critical classes 3: repeat 4: if C = {Ci }i with √ Ci ∩ S1 = ∅, ∀Ci then 5: D ← ({d}, { }, ∅, {hd, √1i}) 6: T1 ← Linking(D, d, , 1, T1 ) 7: else if C = {Ci√ }i with ∀Ci : Ci ∩ S2 = ∅ then 8: D ← ({d}, { }, ∅, {hd, √1i}) 9: T2 ← Linking(D, d, , 1, T2 ) 10: else if C = {Ci }i with Cj ⊆ S1 6= ∅ = 6 Ck ⊆ S2 then 11: choose s1 ∈ Ck 12: choose s2 ∈ Cj 13: FixIt(T1 , s1 , T2 , s2 ) 14: end if 15: C ←MaxLumping(T1 , T2 ) 16: until C = ∅ . i.e. T1 ∼tb T2 17: return T1 , T2 18: end procedure Algorithm 3. Padding Algorithm T2 = (S2 , A, −→, {hr2 , 1i}) and tries to construct a probabilistic bisimulation for Tˆ1 and Tˆ2 by lumping Tˆ. At each step the procedure constructs a partition P = {B1 , B2 , . . . , Bm } of the set S1 ∪ S2 and identifies a set of critical blocks C = {C1 , . . . Ck } ⊆ P . A critical block Cj is an equivalence class which has representatives in only one of the two PTS’s, i.e. such that Cj ∩ S1 = ∅ or Cj ∩ S2 = ∅. The procedure works on“layers”, i.e. at each iteration MaxLumping(T1 , T2 ) partitions the nodes in layer n by choosing a splitter among the blocks in the n layer cut-off; it moves on to the next layer only when all such blocks have been considered. Padding. The padding procedure identifies the critical blocks by calling the procedure MaxLumping(T1 , T2 ) and then transforms the two processes T1 and T2 in order to “fix” the transitions to the critical blocks. The transformation is performed by the procedure FixIt which we will describe later. In the special case where the critical blocks do not contain any representative of either of the two processes, say T1 , the fixing consists in padding T1 with a depleted sub-tree formed by its root only which is a dummy state. This is linked to the root of T1 via a ticked transition with probability 1 by the Linking procedure. In the general (more symmetric) case where some of the critical blocks contain only states from one process and some other critical blocks only states from the other process the transformation is made by the FixIt procedure. 16 1: procedure FixIt(T1 , s1 , T2 , s2 ) 2: for all Bi blocks do . determine all possible changes p1i p2i 3: Assume: s1 −→ Bi and s2 −→ Bi 4: p0i ← min(p1i , p2i ) 5: end for 6: Assume: Bk is such that p0k = p2k 7: Assume: Bl is such that p0l = p1l 8: rk ← p1k − p0k 9: rl ← p2l − p0l 1 10: choose t1k ∈ Bk ∩ S1 and T1k sub-tree starting with s1 −→ t1k 1 11: choose t2k ∈ Bk ∩ S2 and T2k sub-tree starting with s1 −→ t2k 1 12: choose t1l ∈ Bl ∩ S1 and T1l sub-tree starting with s2 −→ t1l 1 13: choose t2l ∈ Bl ∩ S2 and T2l sub-tree starting with s2 −→ t2l 14: . make maximal possible changes of probabilities q2l q2k q1l q1k t2l t2k , and s2 −→ t1l , s2 −→ 15: Assume: s1 −→ t1k , s1 −→ 16: r1k ← min(q1k , rk ) 17: q1k ← (q1k − r1k ) 18: r2l ← min(q2l , rl ) 19: q2l ← (q2l − r2l ) 20: . construct and link up “compensating” subtrees 1 1 21: Xkl , Ykl ← Padding(T1k , Tˆ1l ) 2 22: Xkl , Ykl2 ← Padding(Tˆ2k , T2l ) 1 ) 23: Joining(T1 , s1 , r1k , Xkl 2 24: Joining(T2 , s2 , r2l , Ykl ) 25: end procedure Algorithm 4. Fixing Algorithm When among the critical blocks of the current partition there is at least one block Ck which contains only representatives of T1 and one block Cj which contains only representatives of T2 , the fixing is more elaborate and involves both processes. In this case there must exist states s1 ∈ Ck and s2 ∈ Cj with different probabilities of reaching some other blocks Bi of the partition. The FixIt procedure selects two blocks Bk and Bl in previous layer such that the probabilities p1k , p1l of the transitions from s1 to Bk and Bl and the probabilities p2k , p2l of the transitions from s2 to Bk and Bl are such that p1k > p2k and p1l < p2l . This choice is always possible. In fact, the sum of all P probabilities from s1 and s2 to other classes sum up to one, i.e. i p1i = 1 = P i p2i where i covers all classes Bi reachable from s1 or s2 . As s1 and s2 are in different (critical) blocks the probabilities must differ for at least one class Bi , i.e. there exists at least one index k with p1k 6= p2k ; thus we have that either p1k > p2k or p1k < p2k . Without loss of generality, let p1k > p2k , i.e. we found P P the class Bk as required. We now have i6=k p1i + p1k = 1 = i6=k p2i + p2k P P and thus i6=k p1i < i6=k p2i . As the summation is over the same number of probabilities, i.e. positive numbers, there must be at least one summand in 17 P i6=k p2i such that p2l > p1l . After that, FixIt selects four states t1k ∈ Bk ∩ S1 , t2k ∈ Bk ∩ S2 , t1l ∈ Bl ∩ S1 , q2l q2k q1l q1k t2l and t2k , and s2 −→ t1l , s2 −→ t1k , s1 −→ and t2l ∈ Bl ∩ S2 with s1 −→ the corresponding sub-trees T1k , T1l , T2k and T2l which are rooted in t1k , t1l , t2k and t2l . Note that since Bk and Bl are in the CutOff(n − 1) they are not critical blocks, thus it is always possible to find such state. The probabilities q1l and q2k stay unchanged but q1k and q2l will be decreased by the maximal possible amount. Ideally, this decrement should be rk and rl , but if q1k and q2l are smaller than rk and rl we only reduce q1k and q2l to zero, i.e. effectively we remove the corresponding sub-trees. The states s1 and s2 now have the same probabilities to reach the classes Bl and Bk , i.e. have become bisimilar. However, this transformations has two flaws: (i) the system is no longer a PTS as the probabilities from s1 and s2 no longer add up to one, (ii) the distribution on the terminal states reachable in T1k and T2l are reduced. In order to fix this we have to construct “compensating trees” which are bisimilar and which reproduce the missing probabilities for the outputs. These two trees are constructed via calling Padding on a copy of T1k and a depleted version of T1l , and a copy of T2l and a depleted version of T2k respectively. Finally, these “compensating trees” are linked to s1 and √ s2 with a label and the missing probabilities r1k and r2l . This way we obtain again a proper tree-PTS with normalised probability distributions for s1 and s2 . Furthermore, the“compensating trees” produce exactly the same output as T1k and T2l , thus the resulting PTS’s have the same overall output as the original ones. Finally, by construction the compensating trees are bisimilar as T1k and T2k are bisimilar; the same holds for T1l and T2l . 5 Correctness of Padding In order to show that Padding(T1 , T2 ) indeed produces in PTS’s T10 and T20 which are (i) I/O-equivalent, more precisely weak trace equivalent, to T1 and T2 respectively and (ii) tick-bisimilar, i.e. T10 ∼tb T20 , we first describe the effect of various operations performed in Padding on the observables and (non)bisimilar states. Given two states s1 and s2 in two tree-subPTS’s T1 and T2 respectively and a bisimulation relation ∼ on the union T = T1 ⊕ T2 then we define a new subPTS Tsmin as a common minimisation of transition probabilities of s1 and 1 ,s2 s2 by changing the transition probabilities from each of the two states to any class Ck of the bisimulation relation ∼ and all labels α as follows: 18 p0 (s1 , α, Ck ) = min(p(s1 , α, Ck ), p(s2 , α, Ck )) p0 (s2 , α, Ck ) = min(p(s1 , α, Ck ), p(s2 , α, Ck )) p0 (si , α, sj ) = p(si , α, sj ) for i 6∈ {1, 2} . We where p denotes the probabilities in T and p0 the probabilities in Tsmin 1 ,s2 min min denote by T1 and T2 the new subPTS’s corresponding to T1 and T2 respectively. As we require a change of only the (accumulated) probabilities from each state to all classes it is possible to achieve this in various ways by adjusting the stateto-state probabilities in a number of ways, i.e. the common minimisation is not unique. We also define a new relation ∼0 on Tsmin as follows si ∼0 sj if si ∼ sj and 1 ,s2 {i, j} ∩ {1, 2} = ∅ and s1 ∼0 s2 and s2 ∼0 s1 . Lemma 19 Given a state s1 in T1 and a state s2 in T2 and a bisimulation is a relation ∼ on T = T1 ⊕ T2 , then the equivalence relation ∼0 on Tsmin 1 ,s2 bisimulation relation in which s1 and s2 are bisimilar. PROOF. This follows directly from the definition of ∼0 and p0 . 2 Note that ∼0 is not necessarily the coarsest bisimulation relation or lumping on/of Tsmin . It could happen that there exists already a class of states which 1 ,s2 are bisimilar to s1 and s2 after minimisation. Consider two tree-subPTS’s T and S and a state t in T . The insertion of S into T at t with probability q and label λ as the tree-subPTS T Ctq:α S with p0 (ti , α, tj ) = p(ti , α, tj ) for ti , tj ∈ T and all labels α p0 (sk , α, sl ) = p(sk , α, sl ) for si , sj ∈ S and all labels α p0 (t, λ, s0 ) = q where s0 is the root state of S and p denotes the probabilities in T and S respectively, and p0 the probabilities in T Ctq:α S. Lemma 20 Given two pairs of bisimilar tree-subPTS’s T1 ∼ T2 and S1 ∼ S2 and two bisimilar states t1 and t2 in T1 and T2 respectively, then the common insertion of S1 and S2 into T1 and T2 at t1 and t2 with probability q and label λ result in bisimilar subPTS’s, i.e. 1 2 T1 Ctq:λ S1 ∼ T2 Ctq:λ S2 19 PROOF. We define the bisimulation relation/lumping of T1 Ctp1 S1 and T2 Ctp2 S2 as follows: For all states in S1 ⊕ S2 and T1 \ {t1 } ⊕ T2 \ {t2 } we use the same relation as before. For t1 and t2 we exploit the fact that they are bisimilar with respect to all classes in T1 ⊕ T2 and with respect to the (potential) new state/class {s01 , s02 } formed by the roots of S1 and S2 respectively we have by definition the same probability p. 2 Given a tree-subPTS T then we define a root extension of T as a tree-subPTS 0 r Ct1:ω T where we add an additional state r and define the transition probabilities as: p0 (ti , α, tj ) = p(ti , α, tj ) for ti , tj ∈ T and all labels α p0 (r, ω, t0 ) = 1 where t0 is the root state of T and p denotes the probabilities in T and p0 the 0 probabilities in r Ct1:ω T . This can also be seen as a special kind of insertion which explains the notation. Lemma 21 Root extension does not change the I/O observables. PROOF. All the paths πi from the root t0 to the leaves in T are replaced by 0 paths of the form ωπi from r to the leaves in T 0 = r Ct1:ω T , but as ωπi ≡ πi we have the same observables. 2 Let T be a tree-subPTS and S a sub-tree-subPTS of T , then the deletion S in T is denote by T 6 S, i.e. this is the subPTS on the states of T \S obtained by restriction of the transition relation to these states. The deletion of a sub-tree changes the observables as follows. Lemma 22 Given a tree-subPTS T and S a sub-tree-subPTS of T such that the observables of T and S are O(T ) and O(S) and the (unique) path from the root of T to the root of S is π0 then the observables of T 6 S are given by: O(T 6 S) = O(T ) − π0 · O(S) with “·” denoting the pre-fixing: π0 · {hπi , pi i}i = {hπ0 πi , pi i}i . PROOF. By eliminating S we also eliminate the paths through S to leaves. As the observables are distributions over paths we simply have to eliminate the contribution of these paths through S from the (complete) observables of T . The contribution of these paths is not just their “S-stage” but has to be 20 prefixed by the contribution until S, i.e. its root, is reached, this is given by the unique path p0 . 2 Insertion of sub-trees changes observables as follows. Lemma 23 Given two tree-subPTS T and S and a state t in T such that the observables of T and S are O(T ) and O(S) and the (unique) path from the root of T to t is π0 then the observables of the insertion of S into T with probability p with action α has the following observables: O(T Ctp:α S) = O(T ) + p π0 α · O(S) i.e. we have to add the “p-weighted” paths in π0 α · O(S) to O(T ). PROOF. Essentially the same argument as before: We obtain the same paths and thus observables as in T alone, but have to add the contribution of the paths going through S. The latter have to be additionally be prefixed by the path from the root of T to t and the entry action α from t to the root of S. 2 We can also describe the change of observables if we only partially delete the subtree S, if we change the entry probability into S from T , i.e. the probability with which the root of S can be reached from T \ S, from p to p0 . We denote the resulting subPTS by T 6pp0 S. Lemma 24 Given a tree-subPTS T and S a sub-tree-subPTS of T such that the observables of T and S are O(T ) and O(S) and the (unique) path from the root of T to the root of S is π0 then the observables of T 6 S are given by: O(T 6pp0 S) = O(T ) − (1 − p0 /p)(π0 · O(S)). PROOF. The argument is essentially the same as previously, however each path reaching a leaf through S is not completely removed but only its impact on the observables is reduced. In T the probability is a product of the form Q Q k pk · p · l pl where the pk indicate the computational steps in π0 , the probability p is the original entry probability into S, and pl are the probabilities in Q Q S. In T 6pp0 S the probability of this path is changed to: k pk · p0 · l pl . If we therefore remove the contribution of these paths in the original observables and add the contribution in the new subPTS we obtain the desired result: 0 0 O(T 6pp0 S) = O(T ) − π0 · O(S) + pp π0 · O(S) = O(T ) − (1 − pp )π0 · O(S). 2 Proposition 25 Given two tree-subPTS’s T1 and T2 such that T1 ∼io T2 and assume that the n-cutoffs are bisimilar, i.e. T1 |n ∼ T2 |n , then the common 21 minimisation of T1 |n+1 ⊕T2 |n+1 with respect to two non-bisimilar states s1 ∈ T1 and s2 ∈ T2 in layer n + 1 fulfils (1) T1min |n ∼tb T2min |n (2) the number of non-bisimilar states in T1 |n+1 ⊕ T2 |n+1 is strictly reduced (3) the observables change as follows: O(T10 ) = O(T1 ) − (1 − min(p1 , p2 )/p1 )(π1 · O(s14 )) O(T20 ) = O(T2 ) − (1 − min(p1 , p2 )/p2 )(π2 · O(s24 )) where π1 and π2 are the paths from the roots in T1 and T2 to s1 and s2 respectively and s14 and s24 denote the subtrees in T1 and T2 with roots s1 and s2 respectively. PROOF. The bisimulation class a state s belongs to are determined in a finite tree-subPTS only by it successors, i.e. s4 . As we only change the transitions for states s1 and s2 in layer n + 1 nothing changes for the states in the ncutoffs; this shows (1). We do not introduce any new states in layer n + 1 but we force s1 and s2 to be bisimilar, cf Lemma 19. Therefore the number of non-bisimilar states in layer n + 1 is strictly reduced (by two) and as all states in the n-cutoffs are bisimilar by assumption we have shown (2). The observables of T1 and T2 change as in (3) according to Lemma 24. 2 The common minimisation makes two tree-PTS’s more bisimilar — by strictly reducing the number of non-bisimilar states — but it also changes the observables. We can compensate for this by inserting two bisimilar sub-trees which compute the missing observables. Proposition 26 Given two tree-subPTS T1 and T2 such that T1 ∼io T2 , assume that the n-cutoffs are bisimilar, i.e. T1 |n ∼tb T2 |n , and consider the common minimisation of T1 |n+1 ⊕ T2 |n+1 with respect to two non-bisimilar states s1 ∈ T1 and s2 ∈ T2 in layer n + 1, then the insertion of the padded versions of the depleted and original version of s14 and s24 gives: T1 ∼io (T1min Csp11 −min(p1 ,p2 ) Padding(s14 , sd 24 )) T1min ∼tb (T1min Csp11 −min(p1 ,p2 ) Padding(s14 , sd 24 )) T2 ∼io (T2min Csp22 −min(p1 ,p2 ) Padding(sd 14 , s24 )) T2min ∼tb (T2min Csp22 −min(p1 ,p2 ) Padding(sd 14 , s24 )) 22 PROOF. Obviously Timin and (Timin Cspii −min(p1 ,p2 ) Padding(s14 , s24 )), for i = 1, 2, are bisimilar by Lemma 20; and the same is rue if one of the sub-trees s14 or s24 is depleted. The observables of Ti where reduced by the common minimisation according to Lemma 24 and Proposition 25 by reducing the probabilities of some paths through si4 . Inserting the si4 again with the missing probability compensates for this; while the depleted versions sd i4 do not influence the observables. 2 We can now show the main theorem establishing that the padding is correct. Theorem 27 Given two tree-PTS’s T1 and T2 then Padding(T1 , T2 ) returns the tree-PTS’s T10 and T20 such that T10 ∼tb T20 , T10 ∼io T1 , and T20 ∼io T2 . PROOF. The proof is by induction on the height of T1 ⊕ T2 . The algorithm Padding progresses through the trees T1 and T2 layer by layer starting from the terminal states or leaves (base case). The sub-routine MaxLumping tries to identify non-bisimilar states/classes in T1 and T2 respectively. If a mismatch is found on layer n + 1 then the n-cutoffs are bisimilar, i.e. T1 |n ∼tb T1 |n (induction hypothesis). Calling the sub-routine MaxLumping has three possible outcomes: (i) no mismatches are found, (ii) there is only one mismatch state/class on level n + 1, i.e. one of the two trees is higher, or (iii) there are two states/classes s1 and s2 in T1 and T2 respectively which are not bisimilar. If case (i) holds, then T1 and T2 are already bisimilar and the algorithm terminates. In case (ii) we need to extend the shorter tree, i.e. assuming w.l.o.g. that T1 is shorter, we need to construct a root extension r Ct10 : ωT1 . By Lemma 21 we have T1 ∼io r Ct10 : ωT1 , and as T1 |n and T2 |n are bisimilar we now also have that T1 |n+1 ∼tb T2 |n+1 . In the general case (iii) the transformation proceeds in two phases: In phase one we construct a common minimisation of T1 and T2 such the n-cutoff are still bisimilar, and such that the number of non-bisimilar states in the n + 1-cutoff is strictly reduced, cf. Proposition 25. In phase two we insert subtrees which compensate for the missing observables (and also guarantee that the normalisation condition of proper PTS’s are fulfilled); as the new subtrees are bisimilar we do not increase the number of non-bisimilar states, see Proposition 26. In other words: Phase one strictly reduced the number of non-bisimilar states while phase two does not increase it; and: Phase one changes the observables but phase two compensates again. Together we therefore have a monotone decrease of the number of non-bisimilar states while the observables remain unchanged. As we have only finitely many states in a layer we will eventually reach the 23 situation that there are no non-bisimilar states in layer n+1 or more generally in the n + 1-cutoff. The algorithm therefore terminates in each layer and as we also have only finitely many layers the algorithm terminates with T10 ∼tb T20 , T10 ∼io T1 , and T20 ∼io T2 . 2 6 A Simple Example Consider two processes A and B (with p 6= q and p, q 6∈ {0, 1}) defined as: A •2 ~ p:a ~~ ~ ~ ~~~ •1 @@ @@(1−p):b @@ @ and B •6 •3 1:c ~ q:a ~~ ~ ~ ~~~ •5 @@ @@(1−q):b @@ @ •7 •4 1:c •8 The aim is to construct two processes A0 and B 0 such that A0 ∼tb B 0 and A ∼io A0 and B ∼io B 0 . Note that A and B do not have the same I/O observables, i.e. A 6∼io B, and that we also do not require this for the transformed processes. If we call MaxLumping(A, B) we have as first splitter the set of all states, i.e. S1 = {1, 2, . . . , 8}. The backward image of S1 , i.e. calling Splitting(S1 , {S1 }), allows us to distinguish the terminal nodes from ‘internal’ nodes, i.e. we get the partition K1 = {{2, 4, 6, 8}, {1, 3, 5, 7}}. Obviously, both processes have representatives in each of the two blocks of this partition. If we continue lumping using the splitter S2 = {2, 4, 6, 8}, i.e. the terminal nodes, we get a refined partition K2 = {{2, 4, 6, 8}, {3, 7}, {1}, {5}}. The probability of reaching S2 from nodes 3 and 7 is the same (namely 1) while 1 and 5 reach the terminal nodes with different probabilities (namely p and q) and thus form two separate blocks. The splitting K2 now is not properly balanced, i.e. the class {1} has no representatives in B and {5} has no representatives in A. MaxLumping(A, B) will thus report {1} and {5} as critical classes. In order to fix the ‘imbalance’ between A and B we have to do two things: (i) minimising/unifying the probabilities of transitions emitting from 1 and 5, and (ii) introducing subtrees X and Y to compensate for the missing outputs 24 and to re-establish normalised probabilities. • The first step thus is to transform A and B into the following form: ~ min(p,q):a ~~ ~ ~ ~~~ •2 and •1 @@ @@min(1−p,1−q):b @@ @ •3 ~ min(p,q):a ~~ ~ ~ ~~~ •6 1:c •5 @@ @@min(1−p,1−q):b @@ @ •7 •4 1:c •8 These two transformed versions of A and B are now probabilistically tick bisimilar but unfortunately they have different outputs, i.e. they are not I/O equivalent; moreover they fail to fulfil the normalisation condition for PTS and Markov chains. • To compensate for the “missing” probability r = 1 − min(p, q) − min(1 − p, 1−q) we have to construct subtrees X and Y which reproduce the missing output states and which are bisimilar. As we have a critical class, namely Ck = {1} ∈ A and Cj = {5} ∈ B, in each process we have to consider the third case (line 10) in Padding. As the critical classes are singletons we have to take s1 = 1 and s2 = 5 in when we call FixIt(A, 1, B, 5). There are now for each of the states s1 = 1 and s2 = 5 two blocks they can make a move to, namely Bk = {3, 7} and Bl = {2, 4, 6, 8}, but with different probabilities. We have now to chose in FixIt (lines 10 to 13) representative states for these two blocks in each of the two processes A and B. In our example these states are uniquely determined, t1k = 3, t1l = 2, t2k = 7, and t2l = 6. Furthermore, we have to consider the following sub-trees of A and B which are rooted in these representative states: T1l = ◦ • 1:a 2 T1k = ◦ T2l = ◦ 1:b 3 • • 1:c T2k = ◦ 1:a 6 1:b 7 • •4 1:c •8 We assume w.l.o.g. p > q, i.e. min(p, q) = q and min(1 − p, 1 − q) = 1 − p (the other case is symmetric) and thus r1k = r2l = p − q. The compensation 1 1 tree Xkl for A thus must fulfil Xkl ∼io T1l and for Ykl2 in B we have Ykl2 ∼io T2k 1 1 and furthermore Xkl ∼tb Ykl2 . In order to construct Xkl and Ykl2 we thus just have to call (recursively) Padding(T1k , Tˆ1l ) and Padding(Tˆ2k , T2l ). We therefore have to compare twice two trees in order to obtain a padded 25 version (calling Padding recursively): and ◦ ◦ 1:a 20 • √ 1 give Xkl = ◦ 1: • • √ 1: •4 30 0 √ 1: 30 ◦ • √ and ◦ give Ykl2 = ◦ 60 70 0 1:c •8 1:b •7 • 1:a •4 1:b 1: 0 0 1:c •8 0 A dotted transition denotes a ticked or depleted transition: it means that no computation takes place during this transition (i.e. the state does not change) only time passes. Empty nodes ◦ denote copies of the dummy state: they can be seen as place-holders that inherit the state from the previous 1 and Ykl2 fulfil the needed requirenode. Obviously these two sub-trees Xkl ments regarding bisimilarity and I/O equivalence. • Finally we have to introduce these compensatory sub-trees into the transformed versions of A and B to obtain (with r = 1 − min(p, q) − min(1 − p, 1 − q)): A0 and 1 • A } A√ AAmin(1−p,1−q):b min(p,q):a }} } r: AAA }} A } ~} 0 •2 1:a •4 7 7.1 0 5 • } AAA min(1−p,1−q):b min(p,q):a }} A } r:b AAA }} A } ~} 0 •6 •3 •3 B0 1:c •4 •7 •7 1:c •8 0 1:c •8 Comparisons and Related Work Comparison with Agat’s transformation In [2] Agat proposes a transformation technique in the context of a type system with security types. The type system, besides detecting illegal information flow, transforms conditional statements that branch on high-security data into new statements where both conditional branches have been made bisimilar, thus making the new statement immune to timing attacks and the associated high-security data safe. The conditional statement is transformed by adding a ‘dummy copy’ of each branch to its counterpart such that, when the new branches are compared against each other, the original code is matched against its dummy copy. If we adapt this technique to PTS’s and apply it to the processes A and B considered before, we get the transformed processes shown 26 below: √ •2 ~ p:a ~~ ~ ~ ~~~ ◦ √ √ 1:c •4 ◦ ~ •6 √ q: (1−q): ◦ ◦ √ •5 ~ } ! •6 0 }} ~}} 0 (1−q):b •7 1: ◦ √ 1: •5 } q:a }}} 1:c •8 ◦ •7 √ (1−p): ~~ (1−q):b ~~ ~ ~ q:a 1: ◦ √ p: •3 (1−q): ◦ @@(1−p):b @@ @ √ q: and •1 @@ 0 1:c •8 0 According to Agat’s transformation, a ’dummy execution’ of B should follow A and a ’dummy execution’ of A should precede B. Given that A has two final states, two dummy copies of B are appended to process A. Similarly, a dummy copy of A is put before B. However, in order to preserve the tree structure of the process, a new copy of B is made so it can be joined to one of the final states of the dummy copy of A. Clearly the two transformed trees are bisimilar and the new processes are I/Oequivalent to their original versions. It is also clear that this transformation not only creates a greater number of states than our proposed solution but, more importantly, generates processes with longer execution time. 7.2 Related Literature The work by Agat [2] is certainly the most closely related to ours; in fact we are not aware of other approaches to closing timing leaks which exploit program transformation techniques. Apart from the basic differences between our transformation algorithm and the Agat transformation type system explained in the previous section, our approach is in some sense more abstract than [2] in that it does not address a specific language; rather it is in principle adaptable to any probabilistic language whose semantics can be expressed in terms of our PTS model. Timing attacks are prominently studied in security protocol analysis, and in general in the analysis of distributed programs specifically designed to achieve secure communication over inherently insecure shared media such as the Internet. In this setting, timing channels represent a serious threat as shown for example in [18] where a timing attack is mounted on RSA encryption, and in 27 [19] where it is shown how by measuring the time required by certain operation a web site can learn information about the user’s past activities; or in [20] where an attack is described over Abadi’s private authentication protocol [21]. The various approaches which have been proposed in the literature for the time analysis of security and cryptographic protocols mainly exploit languages and tools based on formal methods (timed automata, model checkers and process algebra). For example, [22] develop formal models and a timed automata based analysis for the Felten and Schneider’s web privacy attack mentioned above. A process algebraic approach is adopted in [13] and [23] where the problem of timing information leakage is modelled as non-interference property capturing some time-dependent information flows. A different approach is the one adopted in [24] where static analysis techniques are used to verify communication protocols against timing attacks. 8 Conclusion and Further Work In this paper we have investigated possible countermeasures against timing attacks in a process algebraic setting. The aim is to transform systems such that their timing behaviour becomes indistinguishable – i.e. they become bisimilar – while preserving their computational content – i.e. I/O behaviour. Our particular focus in this work was on finite, tree-like, probabilistic systems and probabilistic bisimulation. Simple obfuscation methods have been consider before, e.g. by Agat [2], which may lead to a substantial increase of the (average) running time of the transformed processes. Our approach attempts to minimise this additional running time overhead introduced by the transformation. The padding algorithm achieves this aim by patching time leaks ‘on demand’ whenever the lumping algorithm which aims to establish the bisimilarity of two processes encounters a critical situation. Further work will be directed towards improving the padding algorithm further. At various stages of the algorithm we make nondeterministic choices (of certain states or classes); it will be important to investigate strategies to achieve optimal choices. We also plan to investigate probabilistic and conditional padding. The current algorithm introduces time leak fixes whenever they appear to be necessary. The resulting processes are therefore perfectly bisimilar. However, one could also think of applying patches randomly or only if the expected time leak exceeds a certain threshold. One would then obtain approximate or ε-bisimilar systems – in the sense of our previous work [7,1] – but could expect a smaller increase of the (average) running time and/or the size of the system or program con28 sidered. Optimising the trade-off between vulnerability against timing attacks – measured quantitatively by ε – and the additional costs – running time, system size etc. – requires a better understanding of the relation between these different factors. References [1] A. Di Pierro, C. Hankin, H. Wiklicky, Measuring the confinement of probabilistic systems, Theoretical Computer Science 340 (1) (2005) 3–56. [2] J. Agat, Transforming out timing leaks, in: POPL ’00: Proceedings of the 27th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, ACM Press, New York, NY, USA, 2000, pp. 40–53. [3] R. van Glabbeek, S. Smolka, B. Steffen, Reactive, generative and stratified models of probabilistic processes, Information and Computation 121 (1995) 59–80. [4] R. Segala, N. Lynch, Probabilistic simulations for probabilistic processes, in: Proceedings of CONCUR 94, Vol. 836 of Lecture Notes in Computer Science, Springer Verlag, 1994, pp. 481–496. [5] J. G. Kemeny, J. L. Snell, Finite Markov Chains, D. Van Nostrand Company, 1960. [6] B. Jonsson, W. Yi, K. Larsen, Probabilistic Extentions of Process Algebras, Elsevier Science, Amsterdam, 2001, Ch. 11, pp. 685–710, see [25]. [7] A. Di Pierro, C. Hankin, H. Wiklicky, Quantitative relations and approximate process equivalences, in: D. Lugiez (Ed.), Proceedings of CONCUR’03, Vol. 2761 of Lecture Notes in Computer Science, Springer Verlag, 2003, pp. 508–522. [8] B. Lampson, A Note on the Confinement Problem, Communications of the ACM 16 (10) (1973) 613–615. [9] J. Goguen, J. Meseguer, Security Policies and Security Models, in: IEEE Symposium on Security and Privacy, IEEE Computer Society Press, 1982, pp. 11–20. [10] J. Gray, III, Towards a mathematical foundation for information flow security, in: Proceedings of the 1991 Symposium on Research in Security and Privacy, IEEE, Oakland, California, 1991, pp. 21–34. [11] P. Ryan, S. Schneider, Process algebra and non-interference, Journal of Computer Security 9 (1/2) (2001) 75–103, special Issue on CSFW-12. [12] K. Larsen, A. Skou, Bisimulation through probabilistic testing, Information and Computation 94 (1991) 1–28. 29 [13] R. Focardi, R. Gorrieri, F. Martinelli, Real-time information flow analysis, IEEE Journal on Selected Areas in Communications 21 (1) (2003) 20–35. [14] S. Derisavi, H. Hermanns, W. H. Sanders, Optimal state-space lumping in Markov chains, Information Processing Letters 87 (6) (2003) 309–315. [15] R. Focardi, R. Gorrieri, Classification of security properties (Part I: Information Flow), in: Foundations of Security Analysis and Design, Tutorial Lectures, Vol. 2171 of Lecture Notes in Computer Science, Springer Verlag, 2001, pp. 331–396. [16] R. Paige, R. Tarjan, Three partition refinement algorithms, SIAM Journal of Computation 16 (6) (1987) 973–989. [17] S. Derisavi, H. Hermanns, W. H. Sanders, Optimal state-space lumping in Markov chains, Information Processing Letters 87 (6) (2003) 309–315. [18] P. Kocher, Cryptanalysis of Diffie-Hellman, RSA, DSS, and other cryptosystems using timing attacks, in: D. Coppersmith (Ed.), Advances in Cryptology, CRYPTO ’95, Vol. 963 of Lecture Notes in Computer Science, Springer-Verlag, 1995, pp. 171–183. [19] E. W. Felten, M. A. Schneider, Timing attacks on web privacy, in: Proceedings of CCS’00, ACM, Athens, Greece, 2000, pp. 25–32. [20] R. J. Corin, S. Etalle, P. H. Hartel, A. Mader, Timed analysis of security protocols, Journal of Computer Security. [21] M. Abadi, Private authentication, in: R. Dingledine, P. F. Syverson (Eds.), 2nd Int. Workshop on Privacy Enhancing Technologies, Vol. 2482 of Lecture Notes in Computer Science, Springer Verlag, 2002, pp. 27–40. [22] R. Gorrieri, R. Lanotte, A. Maggiolo-Schettini, F. Martinelli, S. Tini, E. Tronci, Automated analysis of timed security: A case study on web privacy, Journal of Information Security 2 (2004) 168–186. [23] R. Gorrieri, F. Martinelli, A simple framework for real-time cryptographic protocol analysis with composition proof rules, Science of Computer Programming 50 (2004) 23–49. [24] M. Buchholtz, S. Gilmore, J. Hillston, F. Nielson, Securing staticallyverified communications protocols against timing attacks, in: 1st International Workshop on Practical Applications of Stochastic Modelling (PASM 2004), Vol. 128, 2005, pp. 123–143. [25] J. Bergstra, A. Ponse, S. Smolka (Eds.), Handbook of Process Algebra, Elsevier Science, Amsterdam, 2001. 30
© Copyright 2024