Transfer Engine Algorithm Initialization Load Transfer Rules n


















- Slides: 18
Transfer Engine Algorithm
Initialization Load Transfer Rules n Each rule has w Source & Target Production Rules, e. g. w Source & Target non-terminal alignments w Feature unification equation blocks for parsing, transfer, generation, constraint checking Build First sets to speed up parsing
Morphology If morphology init files were included in parameters, these are loaded and used in parsing (and soon to be in generation)
Parsing - Preprocessing Source text case normalized only if normalizecase variable set n n Allows for Chinese, etc. to go through OK Working on more general normalizations Source sentence then tokenized into words by space and punctuation (very basic)
Parsing - Overview Uses chart parsing algorithm Maintains vector of constituents, which are marked as agenda, key, or chart Runs until all possible keys exhausted and all words visited
Parsing – Lexical When agenda empty, looks at next word If morphology turned on, runs through analyzer, gets possible stem(s), POS, and fstructure(s) for word n (Current implementation only does first stem/POS) Otherwise, just uses word as stem Searches through lexical rules for stem & POS matches where FS unifies, adds as constituents
Parsing – Selecting Key A constituent from agenda is selected and marked as key Add rules starting with key to list of active arcs n n But only if LHS is predicted (in First set) And only if same arc not already in list
Parsing - Adding Arcs Look at active arcs to see if key helps complete their associated rules If key completes arc and constituent FS’s unify by rules parsing equation block n n n Add rules LHS as new constituent if type and range don’t match other constituents Else add link to completed arc from existing constituent (ambiguity packing) Transfer rules for some constituent are packed
Parsing – Final note Add key to chart, check if key is of right type and range to be full parse. Instead of storing FS with constituent, instead associate FS with a parse path (the rules & constituents that built up that FS)
Transfer – Overview Parsing has created n a vector of constituents pointing to arcs w A constituent can have the same subcomponents but transfer differently n a vector of arcs (which in turn point to constituents) A set of feature structures associated with certain parse trees of constituents and arcs
Transfer - Start A parse tree is selected and its FS identified or created Top level transfer rule is run n Transfer equation block is run w From Xn to Yn w Rules used to create top node in target constituent vector n Branch nodes then added to a traversal stack based on X-Y alignments in reverse order
Transfer Walk the parse tree n n guided by transfer rules & alignments building target generation tree (constituents and arcs) as it goes After transfer, run generation n Moves FS from Y 0 to Yn
Transfer – Leaf (Lexical) Node If inserted in rule, add in new word here w Currently only text string, need to add more general mechanism to include constituent type, FS Add in target word from lexical transfer rule, if FSs unify Actual linear sentence not generated until final walk through of generated tree
Transfer – Non-terminal Run transfer FS equation block for rule (Xn -> Yn) Run generation FS equations blocks (Y 0 -> Yn) Create new target arc and constituents, distribute Yn FS’s to them Continue to walk parse tree n n based on X-Y alignment, or create new constituent for inserted word
Transfer – Traversal & Constraint Checking If parse tree walked successfully and all rules succeeded n n Walk generated tree, check Yn-Yn constraints (wait till now to give chance for Yn to have FS) Generate actual translation sentence Continue or stop depending if want to find all translations
Issues Top-down vs. Bottom-up Where to get values to check constraints against?
Symbolic Decoder Transfer rules rarely find a full parse Even full parse can transfer many ways Need decoder to select best combined translation Can use transfer rule scores in selection n Difficult to derive Currently scoring by reference translations Used in combination with language model
Discussion