Predicting a Correct Program in PBE Rishabh Singh
Predicting a Correct Program in PBE Rishabh Singh, Microsoft Research Sumit Gulwani, Microsoft Research
Programming By Examples Intuitive Natural Accessible Ambiguity!
Excel Forums 300_w 1_ani. Sh_c 1_b w 1 =MID(“ 300_w 1_ani. Sh_c 1_b”, 5, 2)
Excel Forums 300_w 30_ani. Sh_c 1_b w 30 =MID($B: $B, FIND(“_”, $B: $B)+1, FIND(“_”, REPLACE($B: $B, 1, FIND(“_”, $B: $B), ””))-1)
Flash. Fill [Gulwani POPL 2011][Gulwani, Harris, Singh CACM 2012] Benchmarks Heuristics VSA DSL Program
Benchmarks Ranking VSA DSL Program
Handling Ambiguity Input Rick Rashid Satya Nadella Output Mr. Rick
Prefer non-constants Input Rick Rashid Satya Nadella Output Mr. Rick Ms. Satya Prefer smaller substrings as constants
Prefer smaller constants Input Satya Nadella Bill Gates Output S. Nadella 2 nd word, last word, 2 nd capital followed by 2 nd lowercase string….
“With great power comes great responsibility. ” Machine Learning for Ranking
Three Challenges Labelled Training Data Machine Learning Algorithm Efficient Ranking Algorithm
Training Data Generation Input Rick Rashid Satya Nadella Peter Lee Output Mr. Rashid Mr. Nadella Mr. Lee
Structuring Hypothesis Space with Sharing in Version-space Associative Expressions f(e 1, f(e 2, f(e 3, e 4))) DAG-based sharing Fixed-arity Expressions f(e 1, e 2, e 3, e 4) Set-based sharing
Ranking Function f(p) Assume Linear Function f(p) = w 1* f 1 + w 2*f 2 + … + wk*fk
Learning To Rank Listwise Approach Logistic Regression All relevant pages over irrelevant Didn’t work well Too strong a constraint
Training Phase Input Rick Rashid Satya Nadella Peter Lee Goal: Find ranking function f(p) over program features that ranks positive programs Output Mr. Rick Mr. Satya Mr. Lee Lower 1 st uppercase letter Constant “r” Lower 2 nd upper case letter ….
Learn DAGs 0 1 2 3 4 5 6 Rick Rashid Mr. Rashid 0 1 2 3 4 5 8 7 6 Satya Nadella Mr. Satya 7 8
Intersect DAGs Rick Rashid Mr. Rick Satya Nadella Mr. Satya
Assign Positive Labels Rick Rashid Mr. Rick Satya Nadella Mr. Satya
Assign Negative Labels Rick Rashid Mr. Rick Satya Nadella Mr. Satya
Rick Rashid Mr. Rick Satya Nadella Mr. Satya Learn ranking function f(p) that ranks programs higher than programs.
Training Phase Rank any positive program over all negative programs Negative Programs Positive Programs
Hierarchical Ranking Substring Expression Frequency of tokens, context, neighborhood, … Atomic Expression Length of substring, input, output, constant, … Concat Expression Number of Arguments, sum, max, min, prod
Evaluation 175 benchmarks 30 -70 train-test partition Baseline (Occam’s razor): Smallest & Simplest programs
6 0 Benchmarks 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85 87 89 91 93 95 97 99 101 103 105 107 109 111 113 115 117 119 121 123 Number of Examples Ranking Evaluation Number of Examples for Learning the Test Task Baseline Learn. Rank 5 4 3 2 1 Learn. Rank learns from 1 example for 79%
Efficiency of Ranking
Ranking for PBE Machine Learning + Synthesis General Loss Function for PBE VSA Sharing Formalization Efficient Features & Algorithms Thanks! risin@microsoft. com
- Slides: 27