How to Implement (ORAM in) MPC Marcel Keller Peter Scholl Nigel Smart

How to Implement (ORAM in) MPC
Marcel Keller
Peter Scholl
University of Bristol
21 February 2014
Nigel Smart
Overview
1. How to Implement MPC
2. ORAM in MPC
Part I
How to Implement MPC
SPDZ: MPC with Preprocessing
Offline Phase
I
Offline Phase
I
Correlated Randomness
I
I
Secret Inputs
I
Online Phase
Online Phase
I
I
Outputs
Independent of secret inputs
Homomorphic encryption
with distributed decryption
Highly parallelizable
No encryption
Information-theoretic security
in random oracle model
How to Share a Secret with Authentication
Shares
MAC shares
MAC key
a1
γ(a)1
α1
a2
γ(a)2
α2
a3
γ(a)3
α3
=
α·a
P
= i γ(a)i
i ai
= a
a
P
=
α
P
i
αi
How to Share a Secret with Authentication
Shares
MAC shares
MAC key
a1 + b1
γ(a)1 + γ(b)1
α1
a2 + b2
γ(a)2 + γ(b)2
α2
a3 + b3
γ(a)3 + γ(b)3
α3
a+b
P
= i ai + bi
Pα · (a + b)
= i γ(a)i + γ(b)i
= a+b
=
α
P
i
αi
Multiplication with Random Triple
(Beaver Randomization)
x · y = (x + a − a) · (y + b − b)
= (x + a) · (y + b) − (y + b) · a − (x + a) · b + a · b
Multiplication with Random Triple
(Beaver Randomization)
x · y = (x + a − a) · (y + b − b)
= (x + a) · (y + b) − (y + b) · a − (x + a) · b + a · b
Masked and opened
Random secret triple
Toolchain Overview
Python-like high-level code
I
Compiler
I
I
Compiler
I
I
I
Bytecode
I
Virtual machine
I
I
Virtual machine
Python
Easier development
Circuit optimization
Speed not an issue
Memory overhead
I
I
Online phase
C++
Fast
∼ 150 instructions
Core Technique: I/O Parallelization
z =x ·y
z =x ·y
u =z ·w
u =v ·w
Core Technique: I/O Parallelization
z =x ·y
z =x ·y
u =z ·w
u =v ·w
1. Mask and open x and y
1. Mask and open x, y , v , and w
2. Compute z
2. Compute z and u
3. Mask and open z and w
4. Compute u
Goal: Automatize I/O Parallelization
I
Manual parallelization is tedious:
x10 = x2 · x3
x20 = x4 + x6
x30 = x19 · x1
x40 = x12 · x39
x50 = x28 + x16
x11 = x8 + x4
x12 = x10 · x1
x21 = x16 + x2
x22 = x0 + x12
x31 = x16 + x26
x32 = x0 · x10
x41 = x34 + x7
x42 = x32 + x5
x51 = x15 + x38
x52 = x50 · x46
x13 = x7 + x9
x23 = x22 + x14
x24 = x11 + x19
x33 = x26 + x32
x34 = x7 + x3
x43 = x12 + x26
x44 = x43 · x38
x53 = x19 + x2
x54 = x20 · x13
x25 = x4 · x19
x26 = x23 · x9
x35 = x9 · x29
x36 = x33 + x22
x45 = x38 + x14
x46 = x44 · x27
x55 = x21 + x22
x56 = x19 · x6
x18 = x11 · x15
x27 = x7 · x5
x28 = x13 + x21
x37 = x29 · x24
x38 = x16 + x23
x47 = x22 + x24
x48 = x39 · x38
x57 = x46 + x1
x58 = x38 · x55
x19 = x13 · x7
x29 = x14 + x27
x39 = x15 + x37
x49 = x21 · x3
x59 = x47 + x29
x14 = x7 · x1
x15 = x9 + x12
x16 = x13 · x14
x17 = x0 + x11
I
SIMD not suitable for every application
Circuit as Directed Acyclic Graph
y
x
+
+
send
send
1
1
recv
recv
triple
I
Nodes: Instructions
I
Edges: Output of instruction is input to another
I
Two instructions for open: sending and receiving
Edge weight:
I
I
I
∗
∗
∗
−
−
+
One between sending and receiving
Zero otherwise
I
Longest path with respect to weights
determines communication round
I
Merge all communication per round
⇒ Optimal number of rounds
Merge by Communication Round / Longest Path
AD
I
Need to re-compute topological order
(no edges pointing backwards)
⇒ Standard algorithm linear in number of edges
I
Heuristic to shorten lifetime of variables
⇒ Reduced memory usage
BH
CEF
GI
Part II
Oblivious RAM in MPC
Goal: Oblivious Data Structures
I
Generally
I
I
I
Oblivious array / dictionary
I
I
I
Secret pointers
Secret type of access if needed
Secret index / key
Secret whether reading or writing
Oblivious priority queue
I
I
Secret priority and value
Secret whether decreasing priority or inserting
Oblivious RAM
x0
x1
Client
(CPU)
x0
x2
x0
Server
(Encrypted RAM)
Oblivious RAM
x$
x$
Client
(CPU)
x$
x$
x$
Server
(Encrypted RAM)
Oblivious RAM in MPC
x$
x$
Client
MPC circuit
x$
Reveal
x$
x$
Server
Secret Sharing
Simple Oblivious Array (Trivial ORAM)
Inner product with index vector
  

[0]
[a0 ]
 ..   .. 
.  . 
  

[0]  [ai−1 ] 
  

[1] ·  [ai ]  = [ai ]
  

[0]  [ai+1 ] 
  

 ..   .. 
.  . 
[0]
[an−1 ]
Index vector computation


?
[i] = 0


..


.




?
[i] 7→  [i] = i 


..


.


?
[i] = n − 1
Simple Oblivious Array
Index vector computation without equality test
P
i
I Bit decomposition: [x] 7→ ([x0 ], . . . , [xn−1 ]) such that x =
i xi 2
?
Demux: ([x0 ], . . . , [xn−1 ]) 7→ ([δ0 ], . . . , [δ2n −1 ]) such that δi = (i = x)
Example: n = 4, i = 2
I
[1]
[0]
[0]
[0]
[0]
[1]
[1]
[0]
[0]
[2]
[1]
Tree-based ORAM
I
Entries stored in tree of trivial ORAMs of fixed size (buckets)
I
Access only buckets on path to specific leaf
per ORAM access
I
Store path in smaller ORAM ⇒ recursion
I
Data-independent eviction to distribute entries over tree
I
Original idea by Shi et al. (Asiacrypt 2011)
I
Path ORAM: improved eviction by Stefanov et al. (CCS 2013)
Oblivious Array Access Timings
Access time (ms)
Two Parties, Online Phase
Original tree-based ORAM
Path ORAM
Simple oblivious array
102
101
100
100
101
102
103 104
Size
105
106
Oblivious Priority Queue
1 : 10
2 : 18
3 : 25
I
Store value-priority pairs
I
Values unique
Operations
I
I
I
I
0 : 20
4 : 23
I
Two oblivious arrays
I
I
[00, λ, 0, 1, 01]
Remove value with minimal priority
Insert new pair
Update value to lower priority
Binary heap by priority
Index to find entries in heap by value
Dijkstra’s Shortest Path Algorithm
a
3
1
s
b
2
1
c
Dijkstra’s Shortest Path Algorithm
a: 1
3
1
s
b
2
1
c: 1
c: 1
a: 1
Dijkstra’s Shortest Path Algorithm
a: 1
3
1
s
b: 3
2
1
c: 1
b: 3
c: 1
Dijkstra’s Shortest Path Algorithm
a: 1
3
1
s
b: 2
2
1
c: 1
b: 2
Dijkstra’s Shortest Path Algorithm
a: 1
3
1
s
b: 2
2
1
c: 1
Dijkstra’s Algorithm in MPC
for each vertex do
outer loop body
for each neighbor do
inner loop body
I
Number of vertices and edges public
I
Graph structure in two oblivious arrays
(vertices and edges)
I
Use oblivious priority queue
Dijkstra’s algorithms uses two nested loops
I
I
I
I
I
One vertices, one of neighbors thereof
MPC would reveal the number of neighbors
for every vertex
Replace by loop over all edges in same order
Flag set when starting with a new vertex
I
Polylog overhead over classical algorithm
I
Previous work: polynomial overhead
Dijkstra’s Algorithm in MPC
for each edge do
outer loop body
(dummy if same vertex)
inner loop body
I
Number of vertices and edges public
I
Graph structure in two oblivious arrays
(vertices and edges)
I
Use oblivious priority queue
Dijkstra’s algorithms uses two nested loops
I
I
I
I
I
One vertices, one of neighbors thereof
MPC would reveal the number of neighbors
for every vertex
Replace by loop over all edges in same order
Flag set when starting with a new vertex
I
Polylog overhead over classical algorithm
I
Previous work: polynomial overhead
Dijkstra’s Algorithm in MPC
Timings for Cycle Graphs
No ORAM
No ORAM (estimated)
Simple array
Simple array (estimated)
Tree-based array
Tree-based array (estimated)
108
Total time (s)
106
104
102
100
10−2
100
101
102
103
Size
104
105
Conclusion
I
Practical MPC requires a dedicated compiler.
(ACM CCS 2013 / ePrint 2013:143)
I
Oblivious data structures in MPC are feasible and useful.
(ePrint 2014:137)