Programming by Sketching

Programming by Sketching
Armando Solar-Lezama, Liviu Tancau, Gilad Arnold,
Rastislav Bodik, Sanjit Seshia UC Berkeley, Rodric Rabbah
MIT,
Kemal Ebcioglu, Vijay Saraswat, Vivek Sarkar IBM
Merge sort
int[] mergeSort (int[] input, int n) {
return merge(
mergeSort (input[0::n/2]),
mergeSort (input[n/2+1::n]) , n);
}
int[] merge (int[] a, int b[], int n) {
int j=0, k=0;
for (int i = 0; i < n; i++)
if ( a[j] < b[k] ) {
result[i] = a[j++];
} else {
result[i] = b[k++];
looks simple to code,
}
but there is a bug
}
return result;
}
2
Merge sort
int[] mergeSort (int[] input, int n) {
return merge(
mergeSort (input[0::n/2]),
mergeSort (input[n/2+1::n]) , n);
}
int[] merge (int[] a, int b[], int n) {
int j, k;
for (int i = 0; i < n; i++)
if ( j<n && ( !(k<n) || a[j] < b[k]) ) {
result[i] = a[j++];
} else {
result[i] = b[k++];
}
}
return result;
}
3
The sketching experience
spec
+
specification
sketch
implementation
(completed sketch)
4
The spec: bubble sort
int[] sort (int[] input, int n) {
for (int i=0; i<n; ++i)
for (int j=i+1; j<n; ++j)
if (input[j] < input[i])
swap(input, j, i);
}
5
Merge sort: sketched
int[] mergeSort (int[] input, int n) {
return merge(
mergeSort (input[0::n/2]),
mergeSort (input[n/2+1::n]) , n);
}
int[] merge (int[] a, int b[], int n) {
int j, k;
for (int i = 0; i < n; i++)
if ( expression(
||, &&, <, !, [] ) ) {
hole
result[i] = a[j++];
} else {
result[i] = b[k++];
}
}
return result;
}
6
Merge sort: synthesized
int[] mergeSort (int[] input, int n) {
return merge(
mergeSort (input[0::n/2]),
mergeSort (input[n/2::n])
}
int[] merge (int[] a, int b[], int n) {
int j, k;
for (int i = 0; i < n; i++)
if ( j<n && ( !(k<n) || a[j] < b[k]) ) {
result[i] = a[j++];
} else {
result[i] = b[k++];
}
}
return result;
}
);
7
Sketching: spec vs. sketch
• Specification
– executable: easy to debug, serves as a prototype
– a reference implementation: simple and sequential
– written by domain experts: crypto, bio, MPEG
committee
• Sketched implementation
– program with holes: filled in by synthesizer
– programmer sketches strategy: machine provides
details
– written by performance experts: vector wizard;
cache guru
8
How sketching fits into autotuning
• Autotuning: two methods for obtaining code
variants
1. optimizing compiler: transform a “spec” in various
ways
2. custom generator: for a specific algorithm
• We seek to simplify the second approach
• Scenario 1: library of variants stores resolved
sketches
– as if written by hand
• Scenario 2: library has unresolved, flexible
sketches
9
SKETCH
• A language with support for sketching-based
synthesis
– like C without pointers
– two simple synthesis constructs
• restricted to finite programs:
– input size known at compile time, terminates on all inputs
• most high-performance kernels are finite:
– matrix multiply: yes
– binary search tree: no
• we’re already working on relaxing the fineteness
restriction
– later in this talk
10
Ex1: Isolate rightmost 0-bit. 1010 0111 
0000 1000
bit[W] isolate0 (bit[W] x) {
// W: word size
bit[W] ret = 0;
for (int i = 0; i < W; i++)
if (!x[i]) { ret[i] = 1; break; }
return ret;
}
bit[W] isolate0Fast (bit[W] x) implements isolate0 {
return ~x & (x+1);
}
bit[W] isolate0Sketched (bit[W] x) implements isolate0 {
return ~(x + ??) & (x + ??);
}
11
Programmer’s view of sketches
• the ?? operator replaced with a suitable constant
• as directed by the implements clause.
• the ?? operator introduces non-determinism
• the implements clause constrains it.
12
Beyond synthesis of literals
• Synthesizing values of ?? already very useful
– parallelization machinery: bitmasks, tables in crypto
codes
– array indices: A[i+??,j+??]
• We can synthesize more than constants
– semi-permutations: functions that select and shuffle
bits
– polynomials: over one or more variables
– actually, arbitrary expressions, programs
14
Synthesizing polynomials
int spec (int x) {
return 2*x*x*x*x + 3*x*x*x + 7*x*x + 10;
}
int p (int x) implements spec {
return (x+1)*(x+2)*poly(3,x);
}
int poly(int n, int x) {
if (n==0) return ??;
else return x * poly(n-1, x) + ??;
}
17
Karatsuba’s multiplication
x = x1*b + x0
y = y1*b + y0
b=2k
x*y = b2*x1*y1 + b*(x1*y0 + x0*y1) + x0*y0
x*y = poly(??,b) * x1*y1 +
+ poly(??,b) * poly(1,x1,x0,y1,y0)*poly(1,x1, x0, y1, y0)
+ poly(??,b) * x0*y0
x*y = (b2 +b) * x1*y1
+
b * (x1 - x0)*(y1 - y0)
+ (b+1) * x0*y0
18
Sketch of Karatsuba
bit[N*2] k<int N>(bit[N] x, bit[N] y) implements mult {
if (N<=1) return x*y;
bit[N/2] x1 = x[0:N/2-1];
bit[N/2] y1 = y[0:N/2-1];
bit[N/2+1] x2 = x[N/2:N-1];
bit[N/2+1] y2 = y[N/2:N-1];
bit[2*N] t11 = x1 * y1;
bit[2*N] t12 = poly(1, x1, x2, y1, y2) * poly(1, x1, x2, y1, y2);
bit[2*N] t22 = x2 * y2;
return multPolySparse<2*N>(2, N/2, t11)
+ multPolySparse<2*N>(2, N/2, t12)
+ multPolySparse<2*N>(2, N/2, t22);
// log b = N/2
}
bit[2*N] poly<int N>(int n, bit[N] x0, x1, x2, x3) {
if (n<=0) return ??;
else return (??*x0 + ??*x1 + ??*x2 + ??*x3) * poly<N>(n-1, x0, x1, x2, x3);
}
bit[2*N] multPolySparse<int N>(int n, int x, bit[N] y) {
if (n<=0) return 0;
else return y << x*?? + multPolySparse<N>(n-1, x, y);
}
19
Semantic view of sketches
• a sketch represents a set of functions:
– the ?? operator modeled as reading from an oracle
int f (int y) {
x = ??;
loop (x) {
y = y + ??;
}
return y;
}
int f (int y, bit[][K] oracle) {
x = oracle[0];
loop (x) {
y = y + oracle[1];
}
return y;
}
Synthesizer must find oracle satisfying f
implements g
20
Synthesis algorithm: overview
1. translation: represent spec and sketch as
circuits
2. synthesis: find suitable oracle
3. code generation: specialize sketch wrt oracle
21
Ex : Population count.
3
x
int pop (bit[W] x)
{
int count = 0;
for (int i = 0; i < W; i++) {
if (x[i]) count++;
}
return count;
}
0010 0110
count
0 0 0 0
one
0 0 0 1
+
mux
count
+
mux
count
+
mux
count
mux
+
F(x) =count
22
Synthesis as generalized SAT
• The sketch synthesis problem is an instance of 2QBF:
 o .  x . P(x) = S(x,o)
• Counter-example driven solver:
I = {}
S(x1, c)=P(x1)  … 
x = random()
c)=P(xk)
do
I ={ x1, x2, …, xk }
I = I U {x}
c = synthesizeForSomeInputs(I)
if c = nil then exit(“buggy sketch'')
x = verifyForAllInputs(c)
// x: counter-example
while x != nil
return c
S(xk,
S(x, c) 
P(x)
23
Case study
• Implemented AES
– the modern block-cipher standard
– 14 rounds: each has table lookup, permutation, GFmultiply
– a good implementation collapses each round into table
lookups
• Our results
–
–
–
–
we synthesized 32Kbit oracle!
synthesis time: about 1 hour
counterexample-driven synthesizer iterated 655 times
performance of synthesized code within 10% of handtuned
24
Finite programs
• In theory, SKETCH is complete for all finite
programs:
– specification can specify any finite program
– sketch can describe any implementation over given
instructions
– synthesizer can resolve any sketch
• In practice, SKETCH scales for small finite
programs
– small finite programs: block ciphers, small kernels
– large finite: big-integer multiplication, matrix
multiplication
• Solution:
25
Lossless abstraction
• Problem
– does result of synthesis for a small matrix work for all
matrices?
• Approach
– spec, sketch have unbounded-input/output
– abstract them into finite functions, with the same
abstraction
– synthesize
– obtained oracle works for original sketch
• Stencil kernels
– concrete:
– abstract:
matrix A[N]  matrix B[N]
A[e(i)], i, N  B[i]
28
Example: divide and conquer
parallelization
• Parallel algorithm:
– Data rearrangement + parallel computation
• spec:
– sequential version of the program
• sketch:
– parallel computation
• automatically synthesized:
– Rearranging the data (dividing the data structure)
29