Bucketing Using MXNet Cyrus M Vahid Principal Solutions

Remember Encode-Decoder (seq 2 seq)?

NMT using encoder-decoder architecture https: //sites. google. com/site/acl 16 nmt/

Word Embedding As we were told the story , we were Wexe d . This is a cat . - - - Lucy used to be the oldest know n Human -like creature . - My uncle bough me a watch . - - Als uns die Gesc hicht e erzählt wurde , waren wie Grau. 11 10 Das ist enie Katze . - - - 5 5 Lucy war früher die the älteste bekannte menschl iche Kreatur . 10 10 Mein Onkel hat mir eine Uhr gekauft . - - 6 7

Bucketing and Padding • In order to avoid building a model for all combination of n m, use bucketing and padding. • A naïve approve is to create a model for a largest sequence example and pad the sentences. This results in sparse vectors with large memory requirements. • Instead bucketing can be used on mini-batches of varying-length sequences. Now we can unroll multiple instances of a network based on bucket size.

Bucketing in MXNet … … … https: //github. com/dmlc/mxnet/blob/master/example/rnn/lstm_bucketing. py

Thank you! Cyrus M. Vahid cyrusmv@amazon. com