Zoomable Cell Michael Schroeder Marcel Spehr, Dimitrij Schlesinger, Stefan Gumhold Fakultät Informatik

Fakultät Informatik
Zoomable Cell
Matthias Reimann, Anne Tuukannen, Michael Schroeder
Marcel Spehr, Dimitrij Schlesinger, Stefan Gumhold
Vision
Microscopy images
10.000nm
1.000 nm
A
B
Data
Protein interactions and 3D structures
100 nm
C
D
10 nm
E
Natural coordinate system
F
Abstract cell
G
1 nm
H
I
Zoomable cell
>200.000 images from
scientific literature
>48.000 3D protein
structures from PDB
?
2
Achievements

Protein interactions and 3D structures



Natural Coordinate System


Algorithm for constraint-based construction of large complexes with use case in
histone methyltransferase SET1
(In Proteomics 2010)
Novelty: constrain combinatorial explosion with protein interaction data
Network visualisation with power graphs with disease applications
Novelty: exploiting modules in graphs for visualisation
(In Experimental Cell Research 2010, Neurological Research 2010, 2 submitted)
DOG4DAG: Semi-automated generation of term, definitions, and hierarchies
Novelty: First system to integrate all steps of ontology generation + Evaluation
(In Bioinformatics 2010)
Microscopy images

Image search and classification with natural coordinate system
Novelty: Image library with 1.3Mio images, Implementation of image similarity
measures and filters, GoImage system with 745000 images
(manuscript in preparation)
3
Contents

Constraint-based modelling of complexes

Image search and classification with natural coordinate system

Limits and Perspectives
4
Histones and histone methyltransferases

DNA is a long string wound around histones
Molecular Cell Biology, Lodish et al.

Histones are modified by histone methyltransferases
Met
Histone H3
Dim-5 protein
Suv39-type histone K9 methyltransferase
Interactions of SET1 subunits
Positive
Interactions
Set1
Set1
Set1
Set1
Set1
Set1
Bre2
Swd1
Swd2
Bre2
-
Bre2
Shg1
Ssp1
Swd1
Swd2
Swd3
Bre2
Swd3
Swd2
Sdc1
Negative
Interactions
AP/MS and Y2H Roguev et al. 2004 , Dehe et al. JBC (2006)
Set1 - Sdc1
Bre2 – Sdc1 – Bre2
6
Workflow
7
Constraints
8
Model
9
Contents

Constraint-based modelling of complexes

Image search and classification with natural coordinate system

Limits and Perspectives
10
Image search and classification

Current engines:




Yale Image finder, (FigSearch), BioText Search Engine
No web
No filtering of graphs, tables, formulas, photos, etc.
Zoomable Cell image search




90% of images not suitable
Yahoo Boss API to retrieve several millions of images
Bottom up approach:

Manual selection of 2.000 from 20.000 images

Expansion of seed images to 745.000 images by image similarity
Image similarity

Gist scene descriptor (960 image features)

Approximate nearest neighbour clustering
11
266 images for
Endoplasmic reticulum
2 out of 266 for rodents
Similar images
12
Navigation in Large Information Spaces

Similarity based vector representation of images transforms
problem into search scenario in high dimensions

Image features constitute space

Images are instances of space points

Usage of Kernel PCA permits consistent handling of similarities
from different feature domains
Similarity
Measures di
Kernel PCA
Combined
Feature Space ϕ
(dimensionality
adjustable)
Image features
Name
Runtime /
image
Quality
Reference
Gist
0.15 sec
++
[Oliva and Torralba;
2006]
Image details
CEDD
0.01 sec
+++
[Chatzichristofis and
Boutalis; 2008]
Specific content
Tanimoto
Coefficient
FCTH
0.01 sec
++
[Chatzichristofis and
Boutalis; 2008]
Specific content
Tanimoto
Coefficient
CLD; SCD;
EHD;
0.01 sec
+
[Manjunath et al;
2002]
Specific content;
intuitiveness
1 sec
++
Unpublished; Inhouse development
Model complexity;
computation time
SIFT Bag-ofFeatures
0.3 sec
++
[Csurka et al; 2004]
Perspective
transformations;
image clutter
Annotation
0.001 sec
+++
[Joachims, 1999]
Existence and
quality of labels
Low Res
0.004 sec
+
Segmentation
Limits
Affine image
transformations;
illumination changes
Similarity
e  L2
e  L2
Dot Product
e  L2
String kernel (word
frequencies)
e  L1
Directed Search Strategies in ϕ
ϕ2

Idea from Optimization

Taxi-Cab Method:
reduce to 1D minimization along ϕ axes

User performs search interactively:

Problem: sparsely sampled space
x0
ϕ1
ϕ2
ϕ1
Directed Search Strategies in ϕ
Sample along each ϕ axis:

as close as possible to axis

reachability of all images
Solution

use Voronoi Regions of ϕ axes to assign
all images

filter out all images that can be reached
indirectly over remaining images
Results
Star view and user feedback
18
Continuous zoom at different time steps and resolutions

Problem: Transforming one image into another by
generating a sequence of intermediate images to
achieve a seamless transition
I

2
As automatic as possible
and user adjustable

Thin plate spline + optical flow
Contributions, Limits, and Perspectives

Contributions:





Limits:




Small scale, semi-automated, constraint-based modelling of complexes possible
Compact visualisation of protein interaction networks with power graphs
Image search (library, textual and image similarity, navigation, learning)
Zooming
Large-scale, automated modelling of complexes not possible
Additional data needed on structures, interactions, EM maps, localisation
Integration of coherent image data
Perspective:


Concrete application in fruitfly development
Integrating video microscopy data, manual annotation, protein and protein
interaction data
20
21