Safe Harbor Statement The following is intended to
Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle. Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 2
Constrained Inference for Neural Networks Jay-Yoon Lee, Michael Wick, Jean-Baptiste Tristan Michael Wick Senior Member of Technical Staff Information Retrieval and Machine Learning Group, Oracle Labs April 27, 2017 Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Consider the following problem Perform decoding in a sequence-to-sequence network, subject to a set of hard constraints on the output sequence. Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal 4
Sequence-to-sequence networks Inputs a sequence x (in X) Outputs sequence y (in Y) Encoder fe: X Rk x 1 x 2 x 3 … xn Decoder fd: Rk Y h h y 1 y 2 y 3 … ym Network f(x; W) = fd(fe(x)) Typically two multilayer LSTMs with “attention. ” Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Sequence-to-sequence networks Machine Translation: Question Answering: Image Captioning Parsing Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal 6
Motivation: Parsing with Neural Networks S NP sequence to tree: sequence to sequence: VP Bond prices rallied. ssrr!sr!srrr! Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal . [Vinyals 15] 7
Sequence-to-sequence networks for parsing • Output sequence must obey hard constraints – Not all sequences of {s, r, !}* encode a valid parse tree • Number of shifts must equal number of input tokens • Can’t reduce an empty stack • Number of shifts must ensure stack is empty – Decoding is myopic: outputs a single symbol at a time • Network learns constraints almost perfectly from data* [Vinyals 15] – *On test set of PTB – *Even state-of-the-art ML systems perform worse in the wild ? ts in ra st n co e th y sf ti sa to wild? How e th in c ti a m le b ro p ts in ra Raises questions: are const Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal 8
Consider the following problem Perform decoding in a sequence-to-sequence network, subject to a set of hard constraints on the output sequence. Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal 9
How do we enforce hard constraints? Sometimes the net satisfies constraints: ssrr!sr!srrr! Bond prices rallied. Sometimes it doesn’t: “So it is a very mixed bag. ” Early errors are amplified through constraints ssr!sr!ssssrrr!rr!ssrrrrrr! ten tokens sr nine shifts We could enforce constraints while decoding, but the decoder is local and constraints are global. We could post process, but there are 406 = choose (29, 2) possible ways to insert {s, r}. Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal 10
Instead: what if we had a function g(y, L)? Where g(y, L) measures how far an output y is from the feasible set L. (g(y, L)=0 if y in L and positive otherwise) Then we could adjust weights on the fly to discourage invalid outputs, and we could employ the net’s unconstrained decoder as a black box. Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal 11
Instead: nudge weights to discourage invalid outputs Iter 0 Maximize y with fixed Wλ (run network’s decoder) ssr!sr!ssssrrr!rr!ssrrrrrr! “So it is a very mixed bag. ” Minimize Wλ with fixed y (one-step SGD/Backprop) Iter 12 Maximize y with fixed Wλ (run network’s decoder) “So it is a very mixed bag. ” sssr!ssssrr!srrr!rr!ssrrrrrr! Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal 12
Global constraints rectifies local errors Our method Constraints in decoder sssr!ssssrr!srrr!rr!ssrrrrrr! ssr!sr!ssssrrr!rr!ssrrrrrr! sr! Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal 13
Satisfy in Decoder ssr!sr!ssssrrr!rr!ssrrrrrr! (“ (so) (it) (is (a very mixed)) bag. ) Our method sssr!ssssrr!srrr!rr!ssrrrrrr! (“ so (it) (is (a (very mixed) bag)). ”) P P P PP “So it is a very mixed bag. ” P P “So it is a very mixed bag. ” Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal 14
More formally Modified Neural Dual Back-propagation Prevent drift Efficient inference possible Constrained Inference Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal 15
Experiments Are constraint violations a problem in practice? If so, can this algorithm satisfy constraints? If so, how does it effect accuracy? Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal 16
Experiments • Data: Penn Tree Bank (PTB), WSJ portion – 40 k train – 2. 4 k test • Networks – Train five networks with different levels of output quality • Varied hyper-parameters: number of layers, number of hidden units, dropout, attention, etc • Quality range: 71. 5 -81. 3 F 1 – Employ both greedy and beam-search decoding Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal 17
Experiments (protocol) • Run each network on the test-set – Measure EVALB F 1* – Compute the failure-set (set of output sequences that violate a constraint) – Measure failure rate (size of failure set over size of test set) – Measure EVALB F 1* on the failure set • Run our each net with our constrained inference procedure – Measure conversion rate on the failure set (number of violated outputs we rectify) – Measure EVALB F 1* on failure set *Use post-processing to correct any remaining invalid sequences Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal 18
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal 19
Conclusions Are constraint violations a problem in practice? YES (6 -40% constraint failure rate) If so, can this algorithm satisfy constraints? Yes (converts up to 94% of violated outputs) If so, how does it effect accuracy? Consistently improves accuracy. Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal 20
Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle. Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 21
- Slides: 22