How to Implement (ORAM in) MPC Marcel Keller Peter Scholl University of Bristol 21 February 2014 Nigel Smart Overview 1. How to Implement MPC 2. ORAM in MPC Part I How to Implement MPC SPDZ: MPC with Preprocessing Offline Phase I Offline Phase I Correlated Randomness I I Secret Inputs I Online Phase Online Phase I I Outputs Independent of secret inputs Homomorphic encryption with distributed decryption Highly parallelizable No encryption Information-theoretic security in random oracle model How to Share a Secret with Authentication Shares MAC shares MAC key a1 γ(a)1 α1 a2 γ(a)2 α2 a3 γ(a)3 α3 = α·a P = i γ(a)i i ai = a a P = α P i αi How to Share a Secret with Authentication Shares MAC shares MAC key a1 + b1 γ(a)1 + γ(b)1 α1 a2 + b2 γ(a)2 + γ(b)2 α2 a3 + b3 γ(a)3 + γ(b)3 α3 a+b P = i ai + bi Pα · (a + b) = i γ(a)i + γ(b)i = a+b = α P i αi Multiplication with Random Triple (Beaver Randomization) x · y = (x + a − a) · (y + b − b) = (x + a) · (y + b) − (y + b) · a − (x + a) · b + a · b Multiplication with Random Triple (Beaver Randomization) x · y = (x + a − a) · (y + b − b) = (x + a) · (y + b) − (y + b) · a − (x + a) · b + a · b Masked and opened Random secret triple Toolchain Overview Python-like high-level code I Compiler I I Compiler I I I Bytecode I Virtual machine I I Virtual machine Python Easier development Circuit optimization Speed not an issue Memory overhead I I Online phase C++ Fast ∼ 150 instructions Core Technique: I/O Parallelization z =x ·y z =x ·y u =z ·w u =v ·w Core Technique: I/O Parallelization z =x ·y z =x ·y u =z ·w u =v ·w 1. Mask and open x and y 1. Mask and open x, y , v , and w 2. Compute z 2. Compute z and u 3. Mask and open z and w 4. Compute u Goal: Automatize I/O Parallelization I Manual parallelization is tedious: x10 = x2 · x3 x20 = x4 + x6 x30 = x19 · x1 x40 = x12 · x39 x50 = x28 + x16 x11 = x8 + x4 x12 = x10 · x1 x21 = x16 + x2 x22 = x0 + x12 x31 = x16 + x26 x32 = x0 · x10 x41 = x34 + x7 x42 = x32 + x5 x51 = x15 + x38 x52 = x50 · x46 x13 = x7 + x9 x23 = x22 + x14 x24 = x11 + x19 x33 = x26 + x32 x34 = x7 + x3 x43 = x12 + x26 x44 = x43 · x38 x53 = x19 + x2 x54 = x20 · x13 x25 = x4 · x19 x26 = x23 · x9 x35 = x9 · x29 x36 = x33 + x22 x45 = x38 + x14 x46 = x44 · x27 x55 = x21 + x22 x56 = x19 · x6 x18 = x11 · x15 x27 = x7 · x5 x28 = x13 + x21 x37 = x29 · x24 x38 = x16 + x23 x47 = x22 + x24 x48 = x39 · x38 x57 = x46 + x1 x58 = x38 · x55 x19 = x13 · x7 x29 = x14 + x27 x39 = x15 + x37 x49 = x21 · x3 x59 = x47 + x29 x14 = x7 · x1 x15 = x9 + x12 x16 = x13 · x14 x17 = x0 + x11 I SIMD not suitable for every application Circuit as Directed Acyclic Graph y x + + send send 1 1 recv recv triple I Nodes: Instructions I Edges: Output of instruction is input to another I Two instructions for open: sending and receiving Edge weight: I I I ∗ ∗ ∗ − − + One between sending and receiving Zero otherwise I Longest path with respect to weights determines communication round I Merge all communication per round ⇒ Optimal number of rounds Merge by Communication Round / Longest Path AD I Need to re-compute topological order (no edges pointing backwards) ⇒ Standard algorithm linear in number of edges I Heuristic to shorten lifetime of variables ⇒ Reduced memory usage BH CEF GI Part II Oblivious RAM in MPC Goal: Oblivious Data Structures I Generally I I I Oblivious array / dictionary I I I Secret pointers Secret type of access if needed Secret index / key Secret whether reading or writing Oblivious priority queue I I Secret priority and value Secret whether decreasing priority or inserting Oblivious RAM x0 x1 Client (CPU) x0 x2 x0 Server (Encrypted RAM) Oblivious RAM x$ x$ Client (CPU) x$ x$ x$ Server (Encrypted RAM) Oblivious RAM in MPC x$ x$ Client MPC circuit x$ Reveal x$ x$ Server Secret Sharing Simple Oblivious Array (Trivial ORAM) Inner product with index vector     [0] [a0 ]  ..   ..  .  .      [0]  [ai−1 ]      [1] ·  [ai ]  = [ai ]     [0]  [ai+1 ]       ..   ..  .  .  [0] [an−1 ] Index vector computation   ? [i] = 0   ..   .     ? [i] 7→  [i] = i    ..   .   ? [i] = n − 1 Simple Oblivious Array Index vector computation without equality test P i I Bit decomposition: [x] 7→ ([x0 ], . . . , [xn−1 ]) such that x = i xi 2 ? Demux: ([x0 ], . . . , [xn−1 ]) 7→ ([δ0 ], . . . , [δ2n −1 ]) such that δi = (i = x) Example: n = 4, i = 2 I [1] [0] [0] [0] [0] [1] [1] [0] [0] [2] [1] Tree-based ORAM I Entries stored in tree of trivial ORAMs of fixed size (buckets) I Access only buckets on path to specific leaf per ORAM access I Store path in smaller ORAM ⇒ recursion I Data-independent eviction to distribute entries over tree I Original idea by Shi et al. (Asiacrypt 2011) I Path ORAM: improved eviction by Stefanov et al. (CCS 2013) Oblivious Array Access Timings Access time (ms) Two Parties, Online Phase Original tree-based ORAM Path ORAM Simple oblivious array 102 101 100 100 101 102 103 104 Size 105 106 Oblivious Priority Queue 1 : 10 2 : 18 3 : 25 I Store value-priority pairs I Values unique Operations I I I I 0 : 20 4 : 23 I Two oblivious arrays I I [00, λ, 0, 1, 01] Remove value with minimal priority Insert new pair Update value to lower priority Binary heap by priority Index to find entries in heap by value Dijkstra’s Shortest Path Algorithm a 3 1 s b 2 1 c Dijkstra’s Shortest Path Algorithm a: 1 3 1 s b 2 1 c: 1 c: 1 a: 1 Dijkstra’s Shortest Path Algorithm a: 1 3 1 s b: 3 2 1 c: 1 b: 3 c: 1 Dijkstra’s Shortest Path Algorithm a: 1 3 1 s b: 2 2 1 c: 1 b: 2 Dijkstra’s Shortest Path Algorithm a: 1 3 1 s b: 2 2 1 c: 1 Dijkstra’s Algorithm in MPC for each vertex do outer loop body for each neighbor do inner loop body I Number of vertices and edges public I Graph structure in two oblivious arrays (vertices and edges) I Use oblivious priority queue Dijkstra’s algorithms uses two nested loops I I I I I One vertices, one of neighbors thereof MPC would reveal the number of neighbors for every vertex Replace by loop over all edges in same order Flag set when starting with a new vertex I Polylog overhead over classical algorithm I Previous work: polynomial overhead Dijkstra’s Algorithm in MPC for each edge do outer loop body (dummy if same vertex) inner loop body I Number of vertices and edges public I Graph structure in two oblivious arrays (vertices and edges) I Use oblivious priority queue Dijkstra’s algorithms uses two nested loops I I I I I One vertices, one of neighbors thereof MPC would reveal the number of neighbors for every vertex Replace by loop over all edges in same order Flag set when starting with a new vertex I Polylog overhead over classical algorithm I Previous work: polynomial overhead Dijkstra’s Algorithm in MPC Timings for Cycle Graphs No ORAM No ORAM (estimated) Simple array Simple array (estimated) Tree-based array Tree-based array (estimated) 108 Total time (s) 106 104 102 100 10−2 100 101 102 103 Size 104 105 Conclusion I Practical MPC requires a dedicated compiler. (ACM CCS 2013 / ePrint 2013:143) I Oblivious data structures in MPC are feasible and useful. (ePrint 2014:137)