Programming by Sketching Armando Solar-Lezama, Liviu Tancau, Gilad Arnold, Rastislav Bodik, Sanjit Seshia UC Berkeley, Rodric Rabbah MIT, Kemal Ebcioglu, Vijay Saraswat, Vivek Sarkar IBM Merge sort int[] mergeSort (int[] input, int n) { return merge( mergeSort (input[0::n/2]), mergeSort (input[n/2+1::n]) , n); } int[] merge (int[] a, int b[], int n) { int j=0, k=0; for (int i = 0; i < n; i++) if ( a[j] < b[k] ) { result[i] = a[j++]; } else { result[i] = b[k++]; looks simple to code, } but there is a bug } return result; } 2 Merge sort int[] mergeSort (int[] input, int n) { return merge( mergeSort (input[0::n/2]), mergeSort (input[n/2+1::n]) , n); } int[] merge (int[] a, int b[], int n) { int j, k; for (int i = 0; i < n; i++) if ( j<n && ( !(k<n) || a[j] < b[k]) ) { result[i] = a[j++]; } else { result[i] = b[k++]; } } return result; } 3 The sketching experience spec + specification sketch implementation (completed sketch) 4 The spec: bubble sort int[] sort (int[] input, int n) { for (int i=0; i<n; ++i) for (int j=i+1; j<n; ++j) if (input[j] < input[i]) swap(input, j, i); } 5 Merge sort: sketched int[] mergeSort (int[] input, int n) { return merge( mergeSort (input[0::n/2]), mergeSort (input[n/2+1::n]) , n); } int[] merge (int[] a, int b[], int n) { int j, k; for (int i = 0; i < n; i++) if ( expression( ||, &&, <, !, [] ) ) { hole result[i] = a[j++]; } else { result[i] = b[k++]; } } return result; } 6 Merge sort: synthesized int[] mergeSort (int[] input, int n) { return merge( mergeSort (input[0::n/2]), mergeSort (input[n/2::n]) } int[] merge (int[] a, int b[], int n) { int j, k; for (int i = 0; i < n; i++) if ( j<n && ( !(k<n) || a[j] < b[k]) ) { result[i] = a[j++]; } else { result[i] = b[k++]; } } return result; } ); 7 Sketching: spec vs. sketch • Specification – executable: easy to debug, serves as a prototype – a reference implementation: simple and sequential – written by domain experts: crypto, bio, MPEG committee • Sketched implementation – program with holes: filled in by synthesizer – programmer sketches strategy: machine provides details – written by performance experts: vector wizard; cache guru 8 How sketching fits into autotuning • Autotuning: two methods for obtaining code variants 1. optimizing compiler: transform a “spec” in various ways 2. custom generator: for a specific algorithm • We seek to simplify the second approach • Scenario 1: library of variants stores resolved sketches – as if written by hand • Scenario 2: library has unresolved, flexible sketches 9 SKETCH • A language with support for sketching-based synthesis – like C without pointers – two simple synthesis constructs • restricted to finite programs: – input size known at compile time, terminates on all inputs • most high-performance kernels are finite: – matrix multiply: yes – binary search tree: no • we’re already working on relaxing the fineteness restriction – later in this talk 10 Ex1: Isolate rightmost 0-bit. 1010 0111 0000 1000 bit[W] isolate0 (bit[W] x) { // W: word size bit[W] ret = 0; for (int i = 0; i < W; i++) if (!x[i]) { ret[i] = 1; break; } return ret; } bit[W] isolate0Fast (bit[W] x) implements isolate0 { return ~x & (x+1); } bit[W] isolate0Sketched (bit[W] x) implements isolate0 { return ~(x + ??) & (x + ??); } 11 Programmer’s view of sketches • the ?? operator replaced with a suitable constant • as directed by the implements clause. • the ?? operator introduces non-determinism • the implements clause constrains it. 12 Beyond synthesis of literals • Synthesizing values of ?? already very useful – parallelization machinery: bitmasks, tables in crypto codes – array indices: A[i+??,j+??] • We can synthesize more than constants – semi-permutations: functions that select and shuffle bits – polynomials: over one or more variables – actually, arbitrary expressions, programs 14 Synthesizing polynomials int spec (int x) { return 2*x*x*x*x + 3*x*x*x + 7*x*x + 10; } int p (int x) implements spec { return (x+1)*(x+2)*poly(3,x); } int poly(int n, int x) { if (n==0) return ??; else return x * poly(n-1, x) + ??; } 17 Karatsuba’s multiplication x = x1*b + x0 y = y1*b + y0 b=2k x*y = b2*x1*y1 + b*(x1*y0 + x0*y1) + x0*y0 x*y = poly(??,b) * x1*y1 + + poly(??,b) * poly(1,x1,x0,y1,y0)*poly(1,x1, x0, y1, y0) + poly(??,b) * x0*y0 x*y = (b2 +b) * x1*y1 + b * (x1 - x0)*(y1 - y0) + (b+1) * x0*y0 18 Sketch of Karatsuba bit[N*2] k<int N>(bit[N] x, bit[N] y) implements mult { if (N<=1) return x*y; bit[N/2] x1 = x[0:N/2-1]; bit[N/2] y1 = y[0:N/2-1]; bit[N/2+1] x2 = x[N/2:N-1]; bit[N/2+1] y2 = y[N/2:N-1]; bit[2*N] t11 = x1 * y1; bit[2*N] t12 = poly(1, x1, x2, y1, y2) * poly(1, x1, x2, y1, y2); bit[2*N] t22 = x2 * y2; return multPolySparse<2*N>(2, N/2, t11) + multPolySparse<2*N>(2, N/2, t12) + multPolySparse<2*N>(2, N/2, t22); // log b = N/2 } bit[2*N] poly<int N>(int n, bit[N] x0, x1, x2, x3) { if (n<=0) return ??; else return (??*x0 + ??*x1 + ??*x2 + ??*x3) * poly<N>(n-1, x0, x1, x2, x3); } bit[2*N] multPolySparse<int N>(int n, int x, bit[N] y) { if (n<=0) return 0; else return y << x*?? + multPolySparse<N>(n-1, x, y); } 19 Semantic view of sketches • a sketch represents a set of functions: – the ?? operator modeled as reading from an oracle int f (int y) { x = ??; loop (x) { y = y + ??; } return y; } int f (int y, bit[][K] oracle) { x = oracle[0]; loop (x) { y = y + oracle[1]; } return y; } Synthesizer must find oracle satisfying f implements g 20 Synthesis algorithm: overview 1. translation: represent spec and sketch as circuits 2. synthesis: find suitable oracle 3. code generation: specialize sketch wrt oracle 21 Ex : Population count. 3 x int pop (bit[W] x) { int count = 0; for (int i = 0; i < W; i++) { if (x[i]) count++; } return count; } 0010 0110 count 0 0 0 0 one 0 0 0 1 + mux count + mux count + mux count mux + F(x) =count 22 Synthesis as generalized SAT • The sketch synthesis problem is an instance of 2QBF: o . x . P(x) = S(x,o) • Counter-example driven solver: I = {} S(x1, c)=P(x1) … x = random() c)=P(xk) do I ={ x1, x2, …, xk } I = I U {x} c = synthesizeForSomeInputs(I) if c = nil then exit(“buggy sketch'') x = verifyForAllInputs(c) // x: counter-example while x != nil return c S(xk, S(x, c) P(x) 23 Case study • Implemented AES – the modern block-cipher standard – 14 rounds: each has table lookup, permutation, GFmultiply – a good implementation collapses each round into table lookups • Our results – – – – we synthesized 32Kbit oracle! synthesis time: about 1 hour counterexample-driven synthesizer iterated 655 times performance of synthesized code within 10% of handtuned 24 Finite programs • In theory, SKETCH is complete for all finite programs: – specification can specify any finite program – sketch can describe any implementation over given instructions – synthesizer can resolve any sketch • In practice, SKETCH scales for small finite programs – small finite programs: block ciphers, small kernels – large finite: big-integer multiplication, matrix multiplication • Solution: 25 Lossless abstraction • Problem – does result of synthesis for a small matrix work for all matrices? • Approach – spec, sketch have unbounded-input/output – abstract them into finite functions, with the same abstraction – synthesize – obtained oracle works for original sketch • Stencil kernels – concrete: – abstract: matrix A[N] matrix B[N] A[e(i)], i, N B[i] 28 Example: divide and conquer parallelization • Parallel algorithm: – Data rearrangement + parallel computation • spec: – sequential version of the program • sketch: – parallel computation • automatically synthesized: – Rearranging the data (dividing the data structure) 29
© Copyright 2024