Unifying Registration and Segmentation for Multi-sensor Images Boris Flach1 , Eeri Kask1 , Dmitrij Schlesinger1 , and Andriy Skulish2 1 2 Dresden University of Technology, [email protected] IRTCITS Cybernetics Center, Kiev Abstract. We propose a method for unifying registration and segmentation of multi-modal images assuming that the hidden scene model is a Gibbs probability distribution. 1 Introduction Obtaining a description of a complex scene by using multi-modal images of the scene (i.e. multi-sensor fusion) is a quite common task which arises for instance in medical diagnosis or in aerial surveying. Usually this task is split into interesting problems of their own: registration of multi-modal images, supervised or unsupervised learning of texture model parameters and segmentation of (registered) multi-modal images with respect to known texture models. The solution of the whole is approached then by solving these subtasks independently. The aim of our paper is to show, that the problem can be solved in a unified fashion (i.e. without splitting it into subtasks). The reasoning here is as follows: prior models are needed to solve each subtask, but in fact these rely on an unique model – the prior model of the scene itself. Let us discuss this in a bit more depth considering at first the registration model. Usually it is not possible (or we don’t want) to use artificial markers, so we have to use either fiducial points or some statistical similarity measures. But the first approach is usually prohibitive due to noise and lack of precise models for such fiducial points. The second approach needs to pose at least a statistical prior model of the scene (e.g. a probabilistic model for tissue segments in tomography). Common registration models are today quite simple. Often it is assumed that all voxels with a particular intensity value in one modality (say MRT) represent the same tissue type, so that values of corresponding voxels of the second modality (say PET) should also be similar to each other. This leads to a simple similarity measure for registration (see [7,4]), if the conditional probabilities for the intensities in each modality given the tissue class of the voxel are gaussians. Of course such a model is an oversimplification. Another way is to use the so called “normalized mutual information” (see e.g. [8]). Again behind the “scene” is a very simple model of the scene: the probability distribution for segment labels (e.g. tissue labels) is assumed to be voxelwise independent (see [3]). Again this model is too simple if we have in mind that we want to L. Van Gool (Ed.): DAGM 2002, LNCS 2449, pp. 190–197, 2002. c Springer-Verlag Berlin Heidelberg 2002 Unifying Registration and Segmentation for Multi-sensor Images 191 use the same prior model for registration and segmentation. We argue that it is necessary to use at least a Gibbs probability distribution in the prior model of possible scene segmentations, i.e. to assume that neighboring voxels are more likely to be of the same label class than to be of different label classes. A second important point is, that we cannot assume to know all parameters of the prior model. Therefore it is necessary to include some kind of unsupervised parameter learning into simultaneous search for the best registration and segmentation. In general this means for instance an unsupervised learning of texture model parameters given some intermediate transform and some intermediate segmentation. Using the obtained texture models we can improve in turn the transform and the segmentation. Repeating this iteratively until a fixpoint is reached we obtain the registration and segmentation simultaneously. 2 The Model Let us assume for simplicity a graph G R, E , being a two or three dimensional lattice with a set of vertices R and edges E. We assume furthermore that we have two modalities (images) of the scene, and describe them by mappings x : R → F and y : R → F which associate intensity values to each vertex of the graph. Let y be additionally transformed by some (unknown) rigid body transform T : R → R. The third (unknown) mapping z : R → K describes the segmentation of the scene, where K is the set of possible segment labels. Posing a statistical model, we consider the tuples (x, y, z) as elementary events and choose the simplest nontrivial model of a probability distribution, i.e. a Gibbs distribution given by p(x, y, z; T ) = 1 g z(r), z(r ) q x(r), z(r) h y(T r), z(r) , Z r r (1) (r,r ) where T , q, h are not known in advance. Let us discuss the assumptions in this formula in more detail. The a´ priori probability distribution p(z) for possible segmentations is given by p(z) = 1 g z(r), z(r ) , Z (r,r ) i.e. a product over all edges of the lattice (Z is a normalizing constant). The function g : K × K → IR is chosen in advance according to the Potts model: a if k = k , g(k, k ) = b otherwise where a, b > 0 and a > b, expressing the expectation that neighboring voxels tend to have the same label class. We assume furthermore that the intensity values in the modalities are conditionally independent given the segment label. 192 B. Flach et al. The functions q, h : F ×K → IR represent conditional probabilities for intensities in modalities, given the segment label. Thats why we can assume without loss of generality, that q(f, k) = 1, h(f, k) = 1 ∀k ∈ K f f holds. Of course the used texture model is very simple here because we assume that the voxel intensities are generated independently. Suppose for a moment that q, h, T are known. Then the segmentation task is a Bayes decision which depends on the chosen cost function for misclassification. If we choose simply C(z, z ∗ ) = δzz∗ , where δ is the Kronecker symbol and z ∗ is the (unknown) “ground truth”, then the minimal average cost is obtained by the maximum-´a-posteriori decision. On the other hand this cost function is not very useful in structural recognition, because the cost of misclassification does not depend on the number of vertices classified incorrectly. A better choice for a cost function is C(z, z ∗ ) = c z(r), z ∗ (r) (2) r with some reasonable local function c : K × K → IR. It is easy to see, that the Bayes decision for such a cost function is obtained as follows: first it is necessary to calculate the probabilities p z(r)=k | x, y; T ∼ p(x, y, z; T ) (3) z: z(r)=k for each vertex r ∈ R and each label k ∈ K. Using these probabilities, the optimal segmentation is found by independent decisions for each vertex (see [5]). If the local cost function is as simple as c(k, k ∗ ) = 1 − δkk∗ , the optimal decisions are z o (r) = arg max p z(r)=k | x, y; T . k Returning now to our problem, the question is, whether it is possible to learn the conditional probabilities q, h (unsupervised), to find T and to segment the registered images using the cost function (2)? To achieve this it is necessary to solve the problem p(x, y, z; T ) → max (4) ln q,h,T z and then to calculate the probablities (3). Both tasks are very hard computational problems. 3 Approximate Solution To maximize (4) we use the EM-algorithm [6,1] and maximize iteratively z α(z) ln p(x, y, z; T ) − z p(x, y, z; T ) z p(x, y, z ; T ) α(z) ln Unifying Registration and Segmentation for Multi-sensor Images 193 which coincides with (4) if α(z) ≥ 0 and z α(z) = 1. Each iteration consists of two steps. In the first – usually called the E-step – choose α so that the second term reaches its maximum (with respect to p(x, y, z; T )) at the given actual p(n) (x, y, z; T ): α(n) (z) = p(n) (z | x, y; T ). In the second step – usually called the M-step – the first term is maximized with respect to p(x, y, z; T ) for the fixed α obtained in the E-step: α(n) (z) ln p(x, y, z; T ) → max q,h,T z Let us consider the M-step in detail. We substitute (1) for p. The function g and the normalizing constant Z are fixed. Thus we are interested only in those terms which depend on q and h. For those depending on q we obtain: p(n) (z | x, y; T ) z r p (n) ln q x(r), z(r) = r z(r)=k | x, y; T ln q x(r), k k Analogously for h: r p(n) z(r)=k | x, y; T ln h y(T r), k k Therefore the problem of calculating the conditional probabilities p z(r)=k | x, y; T given q, h and T must be solved not only for the final segmentation but for each iteration of the EM-algorithm as well. Once these probabilities p(n) z(r)=k | x, y; T are known, the subsequent evaluation of functions q and h and transform T for the next iteration is straightforward: denoting Xf = {r ∈ R | x(r)=f }, the function q (n+1) is obtained as follows: q (n+1) (f, k) = 1 (n) z(r)=k | x, y; T , p n(k) r∈Xf where n(k) is a normalizing constant. The function h(n+1) is obtained in the same way, but here we have to vary also T . We use a small neighborhood centered at the actual T (n) (i.e. small translations and small rotations). Given the q (n+1) , h(n+1) and T (n+1) , we start the next iteration step. This is repeated until a fix-point is reached. Therefore the remaining problem is to calculate p z(r)=k | x, y; T given the functions q and h and the transform T . Actually we don’t know how to solve this problem in a closed fashion, except for simple graphs like trees and partial m-trees. Nevertheless it is possible to estimate these probabilities using a Gibbs sampler (see e.g. [2]): cycling through the vertices of the lattice many times and 194 B. Flach et al. choosing in each step the label z(r) in the vertex r according to the conditional probability distribution p z(r)=k | z(R\r), x, y; T ∼ q x(r), k h y(T r), k g k, z(r ) (5) r : (r,r )∈E we observe the labels k in each vertex r with probabilities p z(r)=k | x, y; T . To be more precise, this holds in the limes of infinite sampling steps if all conditional probabilities in (5) are non-negative. with finite sampling Using the sampler steps, we estimate the probabilities p z(r)=k | x, y; T approximately. Hence we are able to perform the EM-algorithm and to obtain the final segmentation. Fig. 1. Artificial test image pair 1(a) and 1(b). Registration and final labeling is in 1(c). In 1(d) the probability distributions q(f, k) and h(f, k) for each label colour k are shown. 4 Implementation and Experiments The main iteration of the algorithm loops over all possible transforms T around the “current” T (±1 pixel, elementary rotations) while searching the overall “best” registration transform T . During this process one should consider the following. (1) As the EM-algorithm used for maximizing p(x, y) converges into a local maximum one cannot take random values as initialisation for q(f, k) and h(f, k) for subsequent trials of T – as the different intermediate optima then reached are not comparable – but use the q(f, k) and h(f, k) (as well as the z-label field) computed in the previous iteration of T . (Initially the q(f, k) Unifying Registration and Segmentation for Multi-sensor Images 195 and h(f, k) are uniform distributions and the z-field is sampled according to the a priori Gibbs model.) (2) This leads to a reduced number of steps the EM´ routine has to execute for all but the first T try in subsequent cycles. (3) As the T now can have a better “quality” than T from the previous iteration, the current T quality has to be recalculated as well to avoid it from being neglected as a possible choice in this step. Fig. 2. MRT and PET image pair 2(a), 2(b) of a human brain as input. Registration and final labeling (with four colours) is in 2(c). Finally, 2(d) shows q(f, k) and h(f, k). First, the theoretical model discussed in previous sections was tested with an artificial image pair as in Fig. 1(a) and 1(b). In 1(c) the result of registration and final labeling is shown. The statistical model used in generating the input pair can be recognised in 1(d) – the result of the EM-process applied to 1(a) and 1(b). One can observe the q(f, k)-distributions in 1(d) coloured and are equal. The same is true for h(f, k) in regions and . This confirms the fact that there are pairs of label regions in the resulting segmentation 1(c) which correspond to the same texture parameters in 1(a) and 1(b) respectively. 196 B. Flach et al. (a) (b) (c) Fig. 3. Images of Fig. 2 labeled with five in 3(a) and six colours in 3(b). In 3(c) slightly different Potts parameters as in other experiments. Next, the model was applied to MRT and PET images of Fig. 2(a) and 2(b). In 2(c) registration and final labeling with four labels is shown. Label regions correspond to scene background, therefore in 2(d) q(f, k) and h(f, k) overlap. Further, label corresponds to mostly disjunct regions in MRT and PET both q(f, k) and h(f, k) images in areas around and including cranium. In largely overlap according to the most interior of the brain structure in both images. Though, shows the effect of Potts model parameters suppressing thin geometrical elements, so PET image prevails in finegrained structural areas of MRT image. As the label set in this experiment was restricted to four labels in advance, more detail about the structure in input images could not be extracted. In Fig. 3 labelings are shown using five labels in 3(a) and six labels in 3(b). Figure 3(c) shows a result with six labels and slightly different Potts parameters (a/b=3, compared to a/b=9 in all other experiments) being not so strict in region boundary segmentation, particularly compared to 3(b). 5 Conclusions and Acknowledgements We have shown how to unify registration and segmentation of multimodal images assuming hereby a Gibbs probability distribution for the hidden scene description. This includes unsupervised learning of at least simple texture models. We obtain an approximate solution mainly because we actually don’t know how to calculate marginal probabilities for Gibbs distributions in a closed fashion. Though this problem is NP-complete in general it is therefore intriguing to search for solvable subclasses. A second interesting question is how to learn (unsupervised) the Gibbs distribution for the scene model in order to overcome the limitations of the Potts model. At this point we are in dept to mention the great possibility we have had to work with Professor M.I. Schlesinger from Kiev, which – always a fruitful research – has lead us to a lot of invaluable knowledge. One of us (B.F.) was supported during his work on this paper as fellow of the MIRACLE-Centre of Excellence (EC grant No. ICA1-CT-2000-70002) Unifying Registration and Segmentation for Multi-sensor Images 197 and wishes to thank all colleagues at the Center of Machine Perception Prague and especially Professor V. Hlav´ aˇc for interesting discussions and convenient conditions. References 1. A. P. Dempster, N. M. Laird, and D. B. Durbin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, 39:185– 197, 1977. 2. Stuart Geman and Donald Geman. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6(6):721–741, 1984. 3. Alexis Roche, Gregoire Malandain, and Nicholas Ayache. Unifying maximum likelihood approaches in medical image registration. Technical Report 3741, INRIA, 1999. 4. Alexis Roche, Gregoire Malandain, Xavier Pennec, and Nicholas Ayache. Multimodal image registration by maximization of the corralation ratio. Technical Report 3378, INRIA, 1998. 5. M. I. Schlesinger and V. Hlav´ aˇc. Ten Lectures in Statistical and Structural Pattern Recognition. Kluwer Academic Publishers, Dordrecht, 2002. 6. Michail I. Schlesinger. Connection between unsuprevised and supervised learning in pattern recognition. Kibernetika, 2:81–88, 1968. In Russian. 7. Milan Sonka and J. Michael Fitzpatrick, editors. Handbook of Medical Imaging, volume 2. SPIE Press, 2000. 8. C. Studholme, D. L. G. Hill, and D. J. Hawkes. An overlap invariant entropy measure of 3D medical image alignment. Pattern Recognition, 32:71–86, 1999.
© Copyright 2024