How to deal with… non-submodular and higher-order energies (Part 1) Carsten Rother

How to deal with…
non-submodular and higher-order energies
(Part 1)
Carsten Rother
27/06/2014
Machine Learning 2
Advertisement
Theoretical Side:
• Optimization and Learning in discrete-domain models
(CRFs, Higher-order models, continuous label space, loss based learning, etc)
Application Side:
• Scene recovery from multiple images
• 3D Scene understanding
• Bio Imaging
Main Research Theme:
• Combining physics-based vision with machine learning:
Generative models meet discriminative models
27/06/2014
Machine Learning 2
2
State-of-the art CRF models
1
𝒆−𝐸
Gibbs distribution: 𝑝 𝒚|𝒙, 𝒘 =
𝑍 𝒙, 𝒘
Energy: 𝐸 𝒚, 𝒙, 𝒘 =
𝒚,𝒙,𝒘
𝐸𝐹 𝑦𝐹 , 𝒙, 𝑤𝐹
𝐹
Factor graph - compact:
Factors graph:
yi
27/06/2014
Machine Learning 2
3
Deconvolution
Combine physics and machine learning:
1) Using physics:
Add Gaussian “likelihood” (x-K*y)2
2) Put into deep learning appraoch
x
y1
RTF1
RTF2
y2
…
(Stacked RTFs)
Input
x = K*y
27/06/2014
Output y
[Schmidt, Rother, Nowozin, Jancsary, Roth, CVPR 2013.
Best student paper award]
Machine Learning 2
4
Scene recovery from multiple images
2 RGBD Input
27/06/2014
Machine Learning 2
5
Scene recovery from single images
[NIPS 13, joint work with Oxford University]
27/06/2014
Machine Learning 2
6
BioImaging
Joint work with Myers group (Dagmar, Florian, and others)
Atlas
Instance
27/06/2014
Machine Learning 2
7
3D Scene Understanding
• Training time: 3D objects
• Test time:
27/06/2014
Machine Learning 2
8
Advertisement
• If you are excited about any these topics … come to us for a
“forschungspraktikum”, master thesis, diploma thesis, etc
• If you want to collaborate with top industry labs or university
… come to us. Examples:
• BMW, Adobe, Microsoft Research, Daimler, etc.
• Top universities: in Israel, Oxford, Heidelberg, etc.
27/06/2014
Machine Learning 2
9
Advertisement
Joint project with “Institut für Luftfahrt und Logistik“
Lidar scanner
Smart 3D point cloud processing:
- 3D fine-grained recognition: type of aircraft, vehicle, objects,…
- Tracking: 3D models with varying degree of information
- Structured data: how to define a CRF/RTF?
- Combine physics based vision (generative models) with machine
learning
There is an opening for a master project / PhD student –
if you are interested talk to me after lecture!
27/06/2014
Machine Learning 2
10
Reminder: Pairwise energies
𝐸 𝑥 =
𝜃𝑖 (𝑥𝑖 ) +
𝑖∈𝑉
𝜃𝑖𝑗 (𝑥𝑖 , 𝑥𝑗 ) + 𝜃𝑐𝑜𝑛𝑠𝑡
For now, 𝑥 ∈ {0,1}
𝑖,𝑗 ∈𝐸
𝐺 = (𝑉, 𝐸) undirected graph
Visualization of the full energy:
𝑥𝑖 = 0
𝜃𝑖 (0)
𝑥𝑖 = 0
𝑥𝑖 = 1
𝜃𝑖 (1)
𝑥𝑖 = 1
𝑥𝑗 = 0
𝑥𝑗 = 1
𝜃𝑖𝑗 (0,0)
𝜃𝑖𝑗 (0,1)
𝜃𝑖𝑗 (1,0)
𝜃𝑖𝑗 (1,1)
𝜃𝑖𝑗 (0,0) also sometimes
written as: 𝜃𝑖𝑗;00
Submodular Condition:
𝜃𝑖𝑗 0,0 + 𝜃𝑖𝑗 1,1 ≤ 𝜃𝑖𝑗 1,0 + 𝜃𝑖𝑗 0,1
• If all terms are submodular then global optimum can be
computed in polynomial time with graph cut
• If not…this lecture
27/06/2014
Machine Learning 2
11
How often do we have submodular terms?
𝜃𝑖𝑗 0,0 + 𝜃𝑖𝑗 1,1 ≤ 𝜃𝑖𝑗 1,0 + 𝜃𝑖𝑗 0,1
Label smoothness is often the natural condition:
Neigboring pixels have more often than not the same label. We may choose:
𝜃𝑖𝑗 0,0 =𝜃𝑖𝑗 1,1 = 0; 𝜃𝑖𝑗 1,0 =𝜃𝑖𝑗 0,1 ≥ 0
In alpha expansion (reminder later) energy is often “naturally” submodular:
𝑐𝑜𝑠𝑡
Image – left(a)
Image – right(b)
labelling
|𝑥𝑖 − 𝑥𝑗 |
27/06/2014
Machine Learning 2
12
Importance of good optimization
Input: Image sequence
[Data courtesy from Oliver Woodford]
Output: New view
Problem: Minimize a binary 4-connected energy (non-submodular)
(choose a colour-mode at each pixel)
27/06/2014
Machine Learning 2
13
Importance of good optimization
Ground Truth
Graph Cut with truncation
Belief Propagation
[Rother et al ‘05]
QPBOP
QPBO [Hammer ‘84]
(black unknown)
27/06/2014
ICM, Simulated
Annealing
[Boros ’06, see Rother ‘07]
Global Minimum
Machine Learning 2
14
Most simple idea to deal with non-submodular terms
• Truncate all non-submodular terms:
𝜃𝑖𝑗 0,0 + 𝜃𝑖𝑗 1,1 > 𝜃𝑖𝑗 1,0 + 𝜃𝑖𝑗 0,1
𝜃𝑖𝑗 0,0 − 𝛿 + 𝜃𝑖𝑗 1,1 − 𝛿 = 𝜃𝑖𝑗 1,0 + 𝛿 + 𝜃𝑖𝑗 0,1 + 𝛿
1
𝛿 = [𝜃𝑖𝑗 0,0 + 𝜃𝑖𝑗 1,1 − 𝜃𝑖𝑗 1,0 − 𝜃𝑖𝑗 0,1 ]
4
Better techniques to come…
27/06/2014
Machine Learning 2: QPBO and Dual-Decomposition
15
How often do we have non-submodular terms?
• Learning (unconstraint parameters)
MRF
DTF
Red: non-submodular
Training Data
Test Data
blue: submodular
Graph connectivity: 64
27/06/2014
Machine Learning 2
16
Texture Denoising
Training images
Result MRF
4-connected
(neighbours)
27/06/2014
Test image
Test image (60% Noise)
Result MRF
9-connected
Result MRF
4-connected
(7 attractive; 2 repulsive)
Machine Learning 2
17
How often do we have non-submodular terms?
Deconvolution:
Hand-crafted scenarios:
Input Image
User Input
Global optimum
Many more examples later: Diagram recognition, fusion move, etc.
27/06/2014
Machine Learning 2
18
Reparametrization
Two reparametrizations we need:
+𝛿
𝜃𝑐𝑜𝑛𝑠𝑡 − 𝛿
+𝛿
Pairwise
transform
unary
transform
[Minimizing non-submodular energies with graph cut, Kolmogorov, Rother, PAMI 2007]
27/06/2014
Machine Learning 2
19
Put energies into “normal form”
1) Apply all pairwise transformations until
For all pairs of incoming edges it is:
min 𝜃𝑝𝑞0𝑗 , 𝜃𝑝𝑞1𝑗 = 0 for all directed edges p->q and all 𝑗 ∈ 0,1
2) Apply all unary transform until:
min 𝜃𝑝0 , 𝜃𝑝1 = 0 for all p
27/06/2014
Machine Learning 2
20
Construct the graph
Minimum Cut through the graph gives the
solution 𝑥 ∗ = 𝑎𝑟𝑔𝑚𝑖𝑛 𝐸(𝑥)
27/06/2014
Machine Learning 2
21
Construct the graph
Minimum Cut through the graph gives the
solution 𝑥 ∗ = 𝑎𝑟𝑔𝑚𝑖𝑛 𝐸(𝑥)
27/06/2014
Machine Learning 2
22
QPBO method
E ({x p })   E p ( x p )
E ' ({x p }, {x p })  
E p ( x p )  E p (1  x p )
unary
  E pq ( x p , xq )
(sub.)
  E pq ( x p , xq )
(non-sub.)
2
E pq ( x p , xq )  E pq (1  x p ,1  xq )
 pairwise submodular
2

E pq ( x p ,1  xq )  E pq (1  x p , xq )
2
pairwise non-submodular
• Double number of variables: x p  x p , x p
•
•
is submodular!
• Construct graph and solve with graph cut:
less than double the runtime for graph cut
• Method is called QPBO: Quadratic Peusdo Boolean Optimization
(not good name)
[Hammer et al. ’84, Boros et al ’91; see Kolmogorov, Rother ‘07]
27/06/2014
Machine Learning 2
23
Read out the solution
• Assign labels based on minimum cut in auxiliary graph:
x p  1; x p  0
xp  1
x p  0; x p  1
xp  0
x p  0; x p  0
xp  ?
x p  1; x p  1
xp  ?
27/06/2014
Machine Learning 2
24
Properties
1
1
?
?
0
0
0
0
1
1
0
0
1
1
?
?
0
0
0
0
1
1
0
0
1
1
?
?
0
0
0
0
1
1
0
0
x (partial)
y (any complete)
z = FUSE(x,y)
Global optimum
• Autarky(Persistency) Property:
• Partial Optimality: labeled pixels in x belong to a global minimum
• Labeled nodes have the same result as LP relaxation of the problem E
(but QPBO is a very fast solver)
[Hammer et al ’84, Schlesinger ‘76, Werner ’07, Kolmogorov, Wainright ’05; Kolmogorov ’06]
27/06/2014
Machine Learning 2
25
When do we get all nodes labeled?
• function is submodular
•t
• If there exist a flipping that makes the energy fully submodular,
then QPBO will find it
• We can be simply “lucky”
• What to do with unlabelled nodes: run some other method
(e.g. BP)
27/06/2014
Machine Learning 2
26
Extension: QPBOP (“P” standard for “Probing”)
QPBO:
0
?
p
?
q
?
r
?
s
?
t
Probe Node p:
0
0
0
0
?
?
p
q
r
s
t
• xq  0 for a global minimum
• x p  xr
0
1
p
0
1
0
q
r
s
?
t
remove node xq from energy
remove node xr from energy
• x p , xs add directed link
• Why did QPBO not find this solution?
Enforce integer constraint on p (tighter relaxation)
27/06/2014
Machine Learning 2
27
Two extensions: QPBOP, QPBOI
1.
2.
3.
4.
5.
Run QPBO - gives set of unlabeled nodes U
Probe a p U
Simplify energy: Remove nodes and add links
Run QPBO, update U
Stop if energy stays for all p U
otherwise go to 2.
Properties: - New energy preserves global optimality
and (sometimes) gives the global minimum
- Order may effect result
27/06/2014
Machine Learning 2
28
QPBO versus QPBOP
QPBO
73% unlabeled (0.08sec)
27/06/2014
QPBOP
Global Minimum (0.4sec)
Machine Learning 2
29
Extension: QPBOI (“I” standard for “Improve”)
0
0
0
0
0
0?
0?
1?
0
0
0
1
0
0
0
0
0?
0?
1?
?
0
0
1
0
0
0
0
0
?
?
?
?
0
0
0
0
y (e.g. from BP)
• Property:
27/06/2014
x (partial)
y’ = FUSE(x,y)
[persistency property]
Machine Learning 2
30
Extension: QPBOI (“I” standard for “Improve”)
0
0
0
1
0
0
?
0
?
1
?
0
0
0
1
0
0
1
0
0
?
0
?
1
?
?
0
0
1
0
0
0
0
0
0
?
0
?
1
?
1
?
0
0
1
1
y’
x (partial)
• Property:
y’’ = FUSE(x,y’)
[autarky property]
• QPBOI-algorithm: choose sequence of nested sets
• QPBO-stable: No set changes labelling - sometimes global minima
27/06/2014
Machine Learning 2
31
Results
Three important factors:
• Degree of non-submodularity (NS)
• Unary strength
• Connectivity (av. degree of a node)
27/06/2014
Machine Learning 2
32
Results – Diagram Recognition
• 2700 test cases: QPBOP solved all
Ground truth
BP E=25 (0 sec)
27/06/2014
QPBOP (0sec) - Global Min.
P+BP+I, BP+I E=0 (0sec)
Sim. Ann. E=0 (0.28sec)
QPBO: 56.3% unlabeled (0 sec)
GrapCut E= 119 (0 sec)
Machine Learning 2
ICM E=999 (0 sec)
33
Results - Deconvolution
Ground Truth
Input
QPBO-C 43% unlab. (red) (0.4sec)
ICM E=14 (0sec)
BP E=5 (0.5sec)
BP+I E=3.6 (1sec)
27/06/2014
Machine Learning 2
QPBO 45% unlab. (red) (0.01sec)
GC E=999 (0sec)
C+BP+I, Sim. Ann. E=0 (0.4sec)
34
Move on to multi-label
• Let’s apply QPBO(P/I) methods to multi-label problems
• In particular alpha expansion
27/06/2014
Machine Learning 2: QPBO and Dual-Decomposition
35
Reminder: Alpha expansion
• Variables take label a or retain current label
Status:
Tree
Ground
House
Sky
InitializeSky
Expand
Ground
House
with Tree
[Boykov , Veksler and Zabih 2001]
27/06/2014
Machine Learning 2
36
Reminder: Alpha expansion
• Given the original energy 𝐸(𝑥)
• At each step we have two solutions: 𝒙𝟎 , 𝒙𝟏
𝑥1
𝑥0
= (1 − 𝑥𝑖′ ) 𝑥𝑖0 + 𝑥𝑖′ 𝑥𝑖1
• Define the (variable-wise) combination: 𝑥𝑖01
(where 𝒙′ ∈ {0,1} is selection variable)
• Construct a new energy 𝐸′ such that 𝐸’(𝒙’) = 𝐸(𝒙𝟎𝟏 )
• The move energy 𝐸’(𝒙’) is submodular if:
θij (xa,xb) = 0 iff xa=xb
θij (xa,xb) = θij (xb,xa) ≥ 0
θij (xa,xb) + θij (xb,xc) ≥ θij (xa,xc)
Examples: Potts model, Truncated linear (not truncated quadratic)
Other moves strategies: alpha-beta swap, range move, etc.
[Boykov , Veksler and Zabih 2001]
Machine Learning 2
37
Reminder: Alpha Expansion
• What to do if non-submodular?
• Run QPBO
• For unlabeled pixels:
• choose solution (𝑥 0 or 𝑥 1 ) that has lower energy 𝐸
• Replace unlabeled nodes with chosen solution
• Guarantees that new solution has equal or better energy
than both 𝐸 𝑥 0 and 𝐸 𝑥 1 (see Persistency property)
27/06/2014
Machine Learning 2
38
Fusion Move
• Given the original energy 𝐸(𝑥)
• At each step we have two arbitrary solutions: 𝑥 0 , 𝑥 1
𝑥0
𝑥1
• Define the (variable wise) combination:
𝑥𝑖01 = (1 − 𝑥𝑖′ ) .∗ 𝑥𝑖0 + 𝑥𝑖′ .∗ 𝑥𝑖1
(where 𝑥′ ∈ {0,1} is selection variable)
• Construct a new energy 𝐸′ such that 𝐸’(𝑥’) = 𝐸(𝑥 01 )
• Run QPBO an fix unlabeled nodes as above
• Comment, in practice often submodular if both solutions are good
(since energy prefers neighboring node to be similar)
27/06/2014
Machine Learning 2
39
Fusion move to make alpha expansion parallel
• One processor needs 7 sequential alpha expansions for 8 labels:
1,2,3,4,5,6,7,8
• Four processors need only 3 sequential steps (still 7 alpha expansions):
p1
p2
∎(1-2) ∎(3-4)
p3
p4
∎(5-6) ∎(7-8)
∎(1-4)
∎(5-8)
∎(1-8)
∎ means fusion
27/06/2014
Machine Learning 2
40
Fusion move for continuous label-spaces
Local gradient cost: 𝑥𝑖 − 𝑥𝑖+1
Victor Lempitsky, Stefan Roth, and Carsten Rother, Fusion Flow:DiscreteContinuous Optimization for Optical Flow Estimation, CVPR 2008
27/06/2014
Machine Learning 2
41
FusionFlow - comparisons
27/06/2014
Machine Learning 2
42
LogCut – Dealing efficiently with large label spaces
Optical flow:
1024 discrete labels
Ground truth
27/06/2014
Victor Lempitsky, Carsten Rother, and Andrew Blake, LogCut- Efficient
Graph Cut Optimization for Markov Random Fields, in ICCV, 2007
Machine Learning 2
43
Log Cut – basic idea
E (x)   E p ( x p )   E pq ( x p , xq )
p
with
x p  [0, K ]
p ,q
• Alpha Expansion: we need 𝐾-1 binary decision to get a labeling out
• Encode label space 𝐾 (e.g. 𝐾=64) with log 𝐾 (e.g. 6 bits):
Example: 44 = 101100
We only need log 𝐾(here 6) binary decision to get a labeling out
27/06/2014
Machine Learning 2
44
Example stereo matching
Stereo (Tsukuba) - 16 Labels:
Bit 4:
0xxx
00xx
0-7 versus 8-15
27/06/2014
Bit 1
Bit 2:
Bit 3:
0-3 versus 4-7
001x
0-1 versus 2-3
Machine Learning 2
0010
2 versus 3
45
How to choose the energy?
e.g. bit 3:
E ' (x' )   E ' p ( x' p )   E ' pq ( x' p , x'q ) with x' p  {0,1}
p
Unary:
p ,q
x p  [0,3] x p  [4,7]
E ' p (0)  min ( E p ( x p ))
x p [ 0 , 3]
E’ lower bound of E (tight if no pairwise terms)
27/06/2014
Machine Learning 2
46
How to choose the energy?
Pairwise:
E ' p ,q (0,0)
E p ,q ( x p , xq )  a min[ ( x p  xq ) p , b]
E ' p , q ( x ' p , x'q )
E p ,q (0,0)
E
b
| x p  xq |
0
1
2
3
3
3
3
3
1
0
1
2
3
3
3
3
2
1
0
1
2
3
3
3
3
2
1
0
1
2
3
3
3
3
2
1
0
1
2
3
3
3
3
2
1
0
1
2
3
3
3
3
2
1
0
1
3
3
3
3
3
2
1
0
Approximations:
1.
2.
3.
4.
5.
Choose One
Min
Mean
Weighted Mean
Training
27/06/2014
Machine Learning 2
47
Comparison
Image Restoration (2 different models):
One Min
Mean weight Training aExp
27/06/2014
Mean
One Min Mean weight Training aExp
Machine Learning 2
Mean
48
LogCut
Iterative LogCut:
1. One Sweep – log(K) optimizations
2. Shift Labels
3. One Sweep – log(K) optimizations
4. Fuse with current solution
5. Go to 2.
Energy
Labels:
1,2,3,4,5,6,7,8
no
shift
Shift by 3:
6,7,8,1,2,3,4,5
½
shift
27/06/2014
full
shift
Machine Learning 2
49
Results
Training
LogCut (2 iter); 8sec E=8767
AExp (6 iter); 390sec E=8773
Test
LogCut (64 iter); 150sec E=8469
Ground Truth
Speed-up factor: 20.7
27/06/2014
Machine Learning 2
50
Results
Train
(out of 10)
Test
(out of 10)
27/06/2014
LogCut
1.5sec
Effic. BP
2.1sec
Machine Learning 2
AExp
4.7sec
TRW
90sec
51