Download Report

Unifying Registration and Segmentation
for Multi-sensor Images
Boris Flach1 , Eeri Kask1 , Dmitrij Schlesinger1 , and Andriy Skulish2
1
2
Dresden University of Technology,
[email protected]
IRTCITS Cybernetics Center, Kiev
Abstract. We propose a method for unifying registration and segmentation of multi-modal images assuming that the hidden scene model is a
Gibbs probability distribution.
1
Introduction
Obtaining a description of a complex scene by using multi-modal images of the
scene (i.e. multi-sensor fusion) is a quite common task which arises for instance
in medical diagnosis or in aerial surveying. Usually this task is split into interesting problems of their own: registration of multi-modal images, supervised or
unsupervised learning of texture model parameters and segmentation of (registered) multi-modal images with respect to known texture models. The solution
of the whole is approached then by solving these subtasks independently. The
aim of our paper is to show, that the problem can be solved in a uniﬁed fashion
(i.e. without splitting it into subtasks). The reasoning here is as follows: prior
models are needed to solve each subtask, but in fact these rely on an unique
model – the prior model of the scene itself.
Let us discuss this in a bit more depth considering at ﬁrst the registration
model. Usually it is not possible (or we don’t want) to use artiﬁcial markers,
so we have to use either ﬁducial points or some statistical similarity measures.
But the ﬁrst approach is usually prohibitive due to noise and lack of precise
models for such ﬁducial points. The second approach needs to pose at least a
statistical prior model of the scene (e.g. a probabilistic model for tissue segments
in tomography). Common registration models are today quite simple. Often it
is assumed that all voxels with a particular intensity value in one modality (say
MRT) represent the same tissue type, so that values of corresponding voxels
of the second modality (say PET) should also be similar to each other. This
leads to a simple similarity measure for registration (see [7,4]), if the conditional
probabilities for the intensities in each modality given the tissue class of the voxel
are gaussians. Of course such a model is an oversimpliﬁcation. Another way is to
use the so called “normalized mutual information” (see e.g. [8]). Again behind
the “scene” is a very simple model of the scene: the probability distribution
for segment labels (e.g. tissue labels) is assumed to be voxelwise independent
(see [3]). Again this model is too simple if we have in mind that we want to
L. Van Gool (Ed.): DAGM 2002, LNCS 2449, pp. 190–197, 2002.
c Springer-Verlag Berlin Heidelberg 2002
Unifying Registration and Segmentation for Multi-sensor Images
191
use the same prior model for registration and segmentation. We argue that it
is necessary to use at least a Gibbs probability distribution in the prior model
of possible scene segmentations, i.e. to assume that neighboring voxels are more
likely to be of the same label class than to be of diﬀerent label classes.
A second important point is, that we cannot assume to know all parameters
of the prior model. Therefore it is necessary to include some kind of unsupervised parameter learning into simultaneous search for the best registration and
segmentation. In general this means for instance an unsupervised learning of
texture model parameters given some intermediate transform and some intermediate segmentation. Using the obtained texture models we can improve in
turn the transform and the segmentation. Repeating this iteratively until a ﬁxpoint is reached we obtain the registration and segmentation simultaneously.
2
The Model
Let us assume for simplicity a graph G R, E , being a two or three dimensional
lattice with a set of vertices R and edges E. We assume furthermore that we have
two modalities (images) of the scene, and describe them by mappings x : R → F
and y : R → F which associate intensity values to each vertex of the graph.
Let y be additionally transformed by some (unknown) rigid body transform
T : R → R. The third (unknown) mapping z : R → K describes the segmentation
of the scene, where K is the set of possible segment labels.
Posing a statistical model, we consider the tuples (x, y, z) as elementary
events and choose the simplest nontrivial model of a probability distribution,
i.e. a Gibbs distribution given by
p(x, y, z; T ) =
1 g z(r), z(r )
q x(r), z(r)
h y(T r), z(r) ,
Z
r
r
(1)
(r,r )
where T , q, h are not known in advance. Let us discuss the assumptions in this
formula in more detail. The a´ priori probability distribution p(z) for possible
segmentations is given by
p(z) =
1 g z(r), z(r ) ,
Z
(r,r )
i.e. a product over all edges of the lattice (Z is a normalizing constant). The
function g : K × K → IR is chosen in advance according to the Potts model:
a if k = k ,
g(k, k ) =
b otherwise
where a, b > 0 and a > b, expressing the expectation that neighboring voxels
tend to have the same label class. We assume furthermore that the intensity
values in the modalities are conditionally independent given the segment label.
192
B. Flach et al.
The functions q, h : F ×K → IR represent conditional probabilities for intensities
in modalities, given the segment label. Thats why we can assume without loss
of generality, that
q(f, k) = 1,
h(f, k) = 1 ∀k ∈ K
f
f
holds. Of course the used texture model is very simple here because we assume
that the voxel intensities are generated independently.
Suppose for a moment that q, h, T are known. Then the segmentation task is
a Bayes decision which depends on the chosen cost function for misclassiﬁcation.
If we choose simply C(z, z ∗ ) = δzz∗ , where δ is the Kronecker symbol and z ∗
is the (unknown) “ground truth”, then the minimal average cost is obtained by
the maximum-´a-posteriori decision. On the other hand this cost function is not
very useful in structural recognition, because the cost of misclassiﬁcation does
not depend on the number of vertices classiﬁed incorrectly. A better choice for
a cost function is
C(z, z ∗ ) =
c z(r), z ∗ (r)
(2)
r
with some reasonable local function c : K × K → IR. It is easy to see, that the
Bayes decision for such a cost function is obtained as follows: ﬁrst it is necessary
to calculate the probabilities
p z(r)=k | x, y; T ∼
p(x, y, z; T )
(3)
z: z(r)=k
for each vertex r ∈ R and each label k ∈ K. Using these probabilities, the optimal
segmentation is found by independent decisions for each vertex (see [5]). If the
local cost function is as simple as c(k, k ∗ ) = 1 − δkk∗ , the optimal decisions are
z o (r) = arg max p z(r)=k | x, y; T .
k
Returning now to our problem, the question is, whether it is possible to learn
the conditional probabilities q, h (unsupervised), to ﬁnd T and to segment the
registered images using the cost function (2)? To achieve this it is necessary to
solve the problem
p(x, y, z; T ) → max
(4)
ln
q,h,T
z
and then to calculate the probablities (3). Both tasks are very hard computational problems.
3
Approximate Solution
To maximize (4) we use the EM-algorithm [6,1] and maximize iteratively
z
α(z) ln p(x, y, z; T ) −
z
p(x, y, z; T )
z p(x, y, z ; T )
α(z) ln Unifying Registration and Segmentation for Multi-sensor Images
193
which coincides with (4) if α(z) ≥ 0 and z α(z) = 1. Each iteration consists of
two steps. In the ﬁrst – usually called the E-step – choose α so that the second
term reaches its maximum (with respect to p(x, y, z; T )) at the given actual
p(n) (x, y, z; T ):
α(n) (z) = p(n) (z | x, y; T ).
In the second step – usually called the M-step – the ﬁrst term is maximized with
respect to p(x, y, z; T ) for the ﬁxed α obtained in the E-step:
α(n) (z) ln p(x, y, z; T ) → max
q,h,T
z
Let us consider the M-step in detail. We substitute (1) for p. The function g and
the normalizing constant Z are ﬁxed. Thus we are interested only in those terms
which depend on q and h. For those depending on q we obtain:
p(n) (z | x, y; T )
z
r
p
(n)
ln q x(r), z(r) =
r
z(r)=k | x, y; T ln q x(r), k
k
Analogously for h:
r
p(n) z(r)=k | x, y; T ln h y(T r), k
k
Therefore
the problem of calculating the conditional probabilities p z(r)=k |
x, y; T given q, h and T must be solved not only for the ﬁnal segmentation
but for each iteration
of the EM-algorithm as well. Once these probabilities
p(n) z(r)=k | x, y; T are known, the subsequent evaluation of functions q and
h and transform T for the next iteration is straightforward: denoting Xf = {r ∈
R | x(r)=f }, the function q (n+1) is obtained as follows:
q (n+1) (f, k) =
1 (n) z(r)=k | x, y; T ,
p
n(k)
r∈Xf
where n(k) is a normalizing constant. The function h(n+1) is obtained in the same
way, but here we have to vary also T . We use a small neighborhood centered at
the actual T (n) (i.e. small translations and small rotations). Given the q (n+1) ,
h(n+1) and T (n+1) , we start the next iteration step. This is repeated until a
ﬁx-point is reached.
Therefore the remaining problem is to calculate p z(r)=k | x, y; T given the
functions q and h and the transform T . Actually we don’t know how to solve
this problem in a closed fashion, except for simple graphs like trees and partial
m-trees. Nevertheless it is possible to estimate these probabilities using a Gibbs
sampler (see e.g. [2]): cycling through the vertices of the lattice many times and
194
B. Flach et al.
choosing in each step the label z(r) in the vertex r according to the conditional
probability distribution
p z(r)=k | z(R\r), x, y; T ∼ q x(r), k h y(T r), k
g k, z(r )
(5)
r : (r,r )∈E
we observe the labels k in each vertex r with probabilities p z(r)=k | x, y; T . To
be more precise, this holds in the limes of inﬁnite sampling steps if all conditional
probabilities in (5) are non-negative.
with ﬁnite sampling
Using the sampler
steps, we estimate the probabilities p z(r)=k | x, y; T approximately. Hence we
are able to perform the EM-algorithm and to obtain the ﬁnal segmentation.
Fig. 1. Artiﬁcial test image pair 1(a) and 1(b). Registration and ﬁnal labeling is in 1(c).
In 1(d) the probability distributions q(f, k) and h(f, k) for each label colour k are
shown.
4
Implementation and Experiments
The main iteration of the algorithm loops over all possible transforms T around
the “current” T (±1 pixel, elementary rotations) while searching the overall
“best” registration transform T . During this process one should consider the
following. (1) As the EM-algorithm used for maximizing p(x, y) converges into
a local maximum one cannot take random values as initialisation for q(f, k)
and h(f, k) for subsequent trials of T – as the diﬀerent intermediate optima
then reached are not comparable – but use the q(f, k) and h(f, k) (as well as
the z-label ﬁeld) computed in the previous iteration of T . (Initially the q(f, k)
Unifying Registration and Segmentation for Multi-sensor Images
195
and h(f, k) are uniform distributions and the z-ﬁeld is sampled according to the
a priori Gibbs model.) (2) This leads to a reduced number of steps the EM´
routine has to execute for all but the ﬁrst T try in subsequent cycles. (3) As
the T now can have a better “quality” than T from the previous iteration, the
current T quality has to be recalculated as well to avoid it from being neglected
as a possible choice in this step.
Fig. 2. MRT and PET image pair 2(a), 2(b) of a human brain as input. Registration
and ﬁnal labeling (with four colours) is in 2(c). Finally, 2(d) shows q(f, k) and h(f, k).
First, the theoretical model discussed in previous sections was tested with
an artiﬁcial image pair as in Fig. 1(a) and 1(b). In 1(c) the result of registration
and ﬁnal labeling is shown. The statistical model used in generating the input
pair can be recognised in 1(d) – the result of the EM-process applied to 1(a)
and 1(b). One can observe the q(f, k)-distributions in 1(d) coloured
and
are equal. The same is true for h(f, k) in regions
and
. This conﬁrms the
fact that there are pairs of label regions in the resulting segmentation 1(c) which
correspond to the same texture parameters in 1(a) and 1(b) respectively.
196
B. Flach et al.
(a)
(b)
(c)
Fig. 3. Images of Fig. 2 labeled with ﬁve in 3(a) and six colours in 3(b). In 3(c) slightly
diﬀerent Potts parameters as in other experiments.
Next, the model was applied to MRT and PET images of Fig. 2(a) and 2(b).
In 2(c) registration and ﬁnal labeling with four labels is shown. Label regions
correspond to scene background, therefore in 2(d) q(f, k) and h(f, k) overlap.
Further, label
corresponds to mostly disjunct regions in MRT and PET
both q(f, k) and h(f, k)
images in areas around and including cranium. In
largely overlap according to the most interior of the brain structure in both
images. Though,
shows the eﬀect of Potts model parameters suppressing
thin geometrical elements, so PET image prevails in ﬁnegrained structural areas
of MRT image.
As the label set in this experiment was restricted to four labels in advance,
more detail about the structure in input images could not be extracted. In Fig.
3 labelings are shown using ﬁve labels in 3(a) and six labels in 3(b).
Figure 3(c) shows a result with six labels and slightly diﬀerent Potts parameters (a/b=3, compared to a/b=9 in all other experiments) being not so strict
in region boundary segmentation, particularly compared to 3(b).
5
Conclusions and Acknowledgements
We have shown how to unify registration and segmentation of multimodal images assuming hereby a Gibbs probability distribution for the hidden scene description. This includes unsupervised learning of at least simple texture models. We obtain an approximate solution mainly because we actually don’t know
how to calculate marginal probabilities for Gibbs distributions in a closed fashion. Though this problem is NP-complete in general it is therefore intriguing
to search for solvable subclasses. A second interesting question is how to learn
(unsupervised) the Gibbs distribution for the scene model in order to overcome
the limitations of the Potts model.
At this point we are in dept to mention the great possibility we have had
to work with Professor M.I. Schlesinger from Kiev, which – always a fruitful
research – has lead us to a lot of invaluable knowledge.
One of us (B.F.) was supported during his work on this paper as fellow
of the MIRACLE-Centre of Excellence (EC grant No. ICA1-CT-2000-70002)
Unifying Registration and Segmentation for Multi-sensor Images
197
and wishes to thank all colleagues at the Center of Machine Perception Prague
and especially Professor V. Hlav´
aˇc for interesting discussions and convenient
conditions.
References
1. A. P. Dempster, N. M. Laird, and D. B. Durbin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, 39:185–
197, 1977.
2. Stuart Geman and Donald Geman. Stochastic relaxation, Gibbs distributions, and
the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 6(6):721–741, 1984.
3. Alexis Roche, Gregoire Malandain, and Nicholas Ayache. Unifying maximum likelihood approaches in medical image registration. Technical Report 3741, INRIA,
1999.
4. Alexis Roche, Gregoire Malandain, Xavier Pennec, and Nicholas Ayache. Multimodal image registration by maximization of the corralation ratio. Technical Report
3378, INRIA, 1998.
5. M. I. Schlesinger and V. Hlav´
aˇc. Ten Lectures in Statistical and Structural Pattern
Recognition. Kluwer Academic Publishers, Dordrecht, 2002.
6. Michail I. Schlesinger. Connection between unsuprevised and supervised learning
in pattern recognition. Kibernetika, 2:81–88, 1968. In Russian.
7. Milan Sonka and J. Michael Fitzpatrick, editors. Handbook of Medical Imaging,
volume 2. SPIE Press, 2000.
8. C. Studholme, D. L. G. Hill, and D. J. Hawkes. An overlap invariant entropy
measure of 3D medical image alignment. Pattern Recognition, 32:71–86, 1999.