ppt slides - Microsoft Research

A Machine Learning Framework
for Programming by Example
by
Aditya Menon, UCSD/NICTA
Santosh Vempala, Georgia Tech
Omer Tamuz, Weizmann
Sumit Gulwani, MSR
Butler Lampson, MSR
Adam Tauman Kalai, MSR
The computer learns 𝑓
from a few examples!
Lawrence Carin (5)
John D. Lafferty (4)
Michael I. Jordan (4)
Zoubin Ghahramani (4)
Huan Xu (3)
Ivor W. Tsang (3)
Ambuj Tewari (3)
Csaba Szepesvári (3)
Masashi Sugiyama (3)
Nathan Srebro (3)
Bernhard Schölkopf (3)
Mark D. Reid (3)
Shie Mannor (3)
Rong Jin (3)
Ali Jalali (3)
Hal Daumé III (3)
Steven C. H. Hoi (3)
Geoffrey E. Hinton (3)
Arthur Gretton (3)
David B. Dunson (3)
David M. Blei (3)
Yoshua Bengio (3)
Peilin Zhao (2)
Yaoliang Yu (2)
Tianbao Yang (2)
Zhixiang Eddie Xu (2)
Min Xu (2)
Eric P. Xing (2)
Jialei Wang (2)
Pascal Vincent (2)
Prior work
EBE [Nix85]
Tourmaline [Mye93]
TELS [WM93]
Eager [Cyp93]
Cima [Mau94]
DEED [Fuj98]
SmartEDIT [LWDW01]
LAPIS [Miller02]
FlashFill [Gulwani2011]
[Liang-Jordan-Klein10]
Sidestep the NP-hard search problem
Sequential Transformations by Example Programming System
STEPS: Each step defined by example input→output
Dong Yu, Frank Seide, Gang Li: Conversationa
Nathan Parrish, Maya R. Gupta: Dimensionalit
(Step 1)
Dong Yu, Frank Seide, Gang Li
Nathan Parrish, Maya R. Gupta
STEPS: Each step defined by example input→output
Dong Yu, Frank Seide, Gang Li: Conversationa
Dong Yu, Frank Seide, Gang Li
Dong Yu
(Step 2)
(Step
1)
Nathan Parrish, Maya R. Gupta: Dimensionalit
Nathan Parrish, Maya R. Gupta
Frank
Seide
Gang Li
Nathan Parrish
Maya R. Gupta
x.Replace(/:.*$/gm,"")
nput→output
1)
Dong Yu, Frank Seide, Gang Li
Nathan Parrish, Maya R. Gupta
:.*$/gm,"")
(Step 2)
Dong Yu
Frank Seide
Gang Li
Nathan Parrish
Maya R. Gupta
x.Replace(/, /gm,"\n")
2)
Dong Yu
Frank Seide
Gang Li
Nathan Parrish
Maya R. Gupta
/, /gm,"\n")
(Step 3)
Dong Yu (1)
Frank Seide (1)
Gang Li (1)
Nathan Parrish (1)
Maya R. Gupta (1)
Count or append β€œ (1)”? .
Mock example
2)
Dong Yu
Frank Seide
Gang Li
Nathan Parrish
Maya R. Gupta
/, /gm,"\n")
adam
adam
john
nina
nina
adam
(Step 3)
adam (3)
john (1)
nina (2)
(Step 4)
Join("\n",
ListCat(Dedup(Split(π‘₯, "\n")), " (",
Dedup(Count(Split(π‘₯, "\n"),
Split(π‘₯, "\n"))), ")"))
adam (3)
nina (2)
john (1)
Learning to Search for Programming by example
Given strings π‘₯, 𝑦 ∈ 𝑆, find β€œgood” 𝑓: 𝑆 β†’ 𝑆 such that 𝑓 π‘₯ = 𝑦
(Dynamic programming & genetic algorithms won’t work)
π‘₯
𝑦
Peaches
Bananas
Pears
Apples
Apples
Pears
Bananas
Peaches
PCFG
.
.12
.06
.01
.01
.20
.10
.22
.12
.08
.04
𝑆
𝑆
𝑆
𝑆
β†’ π‘₯
β†’ Join(π·π‘’π‘™π‘–π‘š, 𝑆𝐿𝑖𝑠𝑑)
β†’ β€œPeaches”
β†’ β€œBananas”
...
𝑆𝐿𝑖𝑠𝑑 β†’ Sort(𝑆𝐿𝑖𝑠𝑑, πΆπ‘œπ‘šπ‘)
𝑆𝐿𝑖𝑠𝑑 β†’ Reverse(𝑆𝐿𝑖𝑠𝑑)
𝑆𝐿𝑖𝑠𝑑 β†’ Split(𝑆, π·π‘’π‘™π‘–π‘š)
...
π·π‘’π‘™π‘–π‘š β†’ β€œ\n”
π·π‘’π‘™π‘–π‘š β†’ β€œ ”
π·π‘’π‘™π‘–π‘š β†’ 𝑆
...
Join
β€œ\n” Reverse
Split
π‘₯
β€œ\n”
Learning to Search for Programming by example
Given strings π‘₯, 𝑦 ∈ 𝑆, find β€œgood” 𝑓: 𝑆 β†’ 𝑆 such that 𝑓 π‘₯ = 𝑦
Enumerate PCFG programs in order of likelihood.
π‘₯
𝑦
Peaches
Bananas
Pears
Apples
Apples
Pears
Bananas
Peaches
Trained on corpus of
tasks from help forums
PCFG
.
.12
.06
.01
.01
.20
.10
.22
.12
.08
.04
𝑆
𝑆
𝑆
𝑆
β†’ π‘₯
β†’ Join(π·π‘’π‘™π‘–π‘š, 𝑆𝐿𝑖𝑠𝑑)
β†’ β€œPeaches”
β†’ β€œBananas”
...
𝑆𝐿𝑖𝑠𝑑 β†’ Sort(𝑆𝐿𝑖𝑠𝑑, πΆπ‘œπ‘šπ‘)
𝑆𝐿𝑖𝑠𝑑 β†’ Reverse(𝑆𝐿𝑖𝑠𝑑)
𝑆𝐿𝑖𝑠𝑑 β†’ Split(𝑆, π·π‘’π‘™π‘–π‘š)
...
π·π‘’π‘™π‘–π‘š β†’ β€œ\n”
π·π‘’π‘™π‘–π‘š β†’ β€œ ”
π·π‘’π‘™π‘–π‘š β†’ 𝑆
...
Join
β€œ\n” Reverse
Split
π‘₯
β€œ\n”
The abstract MLE problem:
Given dist. πœ‡ over (π‘₯, 𝑦, data, 𝑓), find argmax Pr 𝑓|π‘₯, 𝑦, data
𝑓
πœ‡
The wrong MLE problem:
Given π‘₯, 𝑦 ∈ 𝑆, dist. πœ‡ over 𝑓: 𝑆 β†’ 𝑆, find argmax Pr 𝑓 ?
𝑓:𝑓 π‘₯ =𝑦 πœ‡
Which program is more likely under πœ‡
√ Remove from : to end of line
Truncate each line to 29 characters
Dong Yu, Frank Seide, Gang Li: Conversationa
Nathan Parrish, Maya R. Gupta: Dimensionalit
Dong Yu, Frank Seide, Gang Li
Nathan Parrish, Maya R. Gupta
The wrong MLE problem:
Given π‘₯, 𝑦 ∈ 𝑆, dist. πœ‡ over 𝑓: 𝑆 β†’ 𝑆, find argmax Pr 𝑓 ?
𝑓:𝑓 π‘₯ =𝑦 πœ‡
Which program is more likely under πœ‡
Remove from : to end of line
√ Truncate each line to 29 characters
/a-z/g
/^$/
24.2
18.5
Tr8
SP
:-)
:(
100%
0%
/a-z/g
/^$/
24.2
18.5
Tr8
SP
The abstract MLE problem:
Given dist. πœ‡πœƒ over (π‘₯, 𝑦, data, 𝑓), find argmax Pr 𝑓|π‘₯, 𝑦, data
𝑓
Estimating system parameters πœƒ:
𝑖
𝑖
(𝑖)
Given training corpus π‘₯ , 𝑦 , data , 𝑓
Choose πœƒ to minimize:
βˆ’ 𝑖 log Pr 𝑓 𝑖 |π‘₯ 𝑖 , 𝑦 𝑖 , data(𝑖) + πœ† πœƒ
πœ‡πœƒ
using convex optimization [Vempala].
πœ‡πœƒ
𝑖 𝑛
𝑖=1
2
Experimental results
Baseline = equal weights (MDL)
*Everything is in Javascript
Conclusions
β€’ Programming by Example involves hard search problem
β€’ Search space generated by clues (features->CFG rules)
β€’ Learn weights on heuristic clues
Future work
β€’ Learned shared structure (like [Liang-Jordan-Klein10])
β€’ Generate more clues on-the-fly
β€’F