ExampleBased Machine Translation Based on the Synchronous SSTC
Example-Based Machine Translation Based on the Synchronous SSTC Annotation Schema The Construction Of Bilingual Knowledge Bank Based On a Bitext Synchronous Parsing Technique Computer Aided Translation Unit School of Computer Sciences University Science Malaysia
Presentation Outline ØIntroduction üStructured String-Tree Correspondence (SSTC) üSynchronous Structured String-Tree Correspondence (SSTC) üEBMT based on synchronous SSTC ØThe Construction of a BKB Based on the Synchronous SSTC üBitext World-level Mapping (Word Alignment) üBitext Synchronous Parsing Technique
The Structured String-Tree Correspondence (SSTC) SSTC = string + arbitrary tree structure + correspondence Correspondence = node(X/Y) X: SNODE = interval of the substring that corresponds to the node. Y: STREE = interval of the substring that corresponds to the subtree having the node as root. Tree eat(2 -3 /0 -4) cats (1 -2/0 -2) mice (3 -4/3 -4) cats (1 -2/0 -2) all (0 -1/0 -1) String 0 mice (3 -4/3 -4) all (0 -1/0 -1) all cats 2 eat 3 mice 1 eat(2 -3/0 -4) 0 -4 2 X: SNODE 3 4 String all cats eateat mice all cats mice 0 -1 1 -2 2 -3 0 1 2 33 -4 4 Y: STREE
Tree eat(2 -3/0 -4) cats (1 -2/0 -2) 1 -2 eat(2 -3/0 -4) cats (1 -2/0 -2) 0 -2 mice (3 -4/3 -4) all (0 -1/0 -1) String 0 all 1 cats 2 1 2 eat mice X: SNODE 3 4 all 0 allcats 1 2 eat mice 3 X: STREE 4
English source sentence “ he picks the ball up” Malay target sentence “dia kutip bola itu” Translation units MALAY ENGLISH E M Index. Stree pick[v] up[p] (1 -2+4 -5/0 -5) he[n] (0 -1/0 -1) kutip[v] (1 -2/0 -4) ball[n] (3 -4/2 -4) dia[n] (0 -1/0 -1) bola[n] (2 -3/2 -4) (0 -5, 0 -4) (0 -1, 0 -1) (2 -4, 2 -4) (2 -3, 3 -4) Index. Snode the[det] (2 -3/2 -3) he pick the ball up 0 1 2 3 4 5 itu[det] (3 -4/3 -4) dia kutip bola itu 0 1 2 3 4 (1 -2+4 -5, 1 -2) (0 -1, 0 -1) (3 -4, 2 -3) (2 -3, 3 -4)
+English source sentence “ I did not give it to him” +French target sentence “Je ne le lui ai pas donné” ENGLISH FRENCH not [neg] (2 -3/0 -7) E F Translation units ne[neg] pas[neg] (1 -2+5 -6/0 -7) Did [v] give [v] (1 -2+3 -4/3 -7) ai[v]donné [v] (4 -5+6 -7/0 -1+2 -5+6 -7) I [n] it [n] to [p] (0 -1/0 -1) (4 -5/4 -5) (5 -6/5 -7) Je [n] lui [n] (0 -1/0 -1) (2 -3/2 -3) (3 -4/3 -4) him [n] (6 -7/6 -7) I did not give it to him 0 1 2 3 4 5 6 7 0 Je 1 ne 2 le 3 lui 4 ai 5 pas 6 donné 7 Index. Stree (0 -7, 0 -7) (0 -2+3 -7, 0 -1+2 -5+6 -7) (0 -1, 0 -1) : Index. Snode (2 -3, 1 -2+5 -6) (1 -2+3 -4, 4 -5+6 -7) (0 -1, 0 -1) (4 -5, 2 -3) (5 -6, - ) (6 -7, 3 -4)
+English source sentence “ hopefully Kim miss Dale” +French target sentence “on espére que Dale manque á Kim” ENGLISH FRENCH miss [v](2 -3/0 -4) E hopefully [adv] (0 -1/0 -1) 0 Kim [n] (1 -2/1 -2) F Dale [n] (3 -4/3 -4) hopefully Kim miss Dale 1 2 3 4 Translation units manque[v] á[p] (4 -5+5 -6/0 -7) on[n]espére[v]que[c] Kim [n] (0 -1+1 -2+2 -3/0 -3) Dale [n] (6 -7/6 -7) (3 -4/3 -4) on espére que Dale manque á Kim 0 1 2 3 4 5 6 7 Index. Stree (0 -1, 0 -3) (1 -2, 6 -7) (0 -4, 0 -7) (3 -4, 3 -4) Index. Snode (0 -1, 0 -1+1 -2+2 -3) (1 -2, 6 -7) (2 -3, 4 -5+5 -6) (3 -4, 3 -4)
Example-Based Machine Translation (EBMT) EBMT is the case-based reasoning approach to MT EBMT uses translated examples of similar sentences to translate a given Source sentence into the target sentence.
The general Architecture for EBMT Target sentence Source sentence Find closest related SL examples Retrieve Corresponding TL examples For Source language correspondence Combination Target language For BKB
EBMT based on synchronous SSTC. Different senses for the word “bank” : source sentence tagger List of sub-synchronous SSTCs generated based on the source sentence bank. Tagged 1: a land beside the river. source ofmoney. Sub-synchronous banksentence 2: a place to. List keep SSTCs constructed from E. g: The 1 man 2 keep 1 his 1 money 1 in 1 example the 1 bank 2. the chosen BKB A chosen closest synchronous SSTC example The resultant synchronous SSTC Replacement & Combination target sentence
Source sentence: The old man picks the green lamp up 1 English sentence: 2 English sentence: He pick the ball up. The lamp is off. Malay translation: Dia kutip bola itu. Lampu itu padam. 3 English sentence: 4 English sentence: The green signal turn on. The old man drink tea. Malay translation: Isyarat hijau itu bertukar. Lelaki tua itu minum teh.
Set of synchronous SSTCs represents Example-base. 1 E 1 M pick(1)[v] up(1)[p] (1 -2+4 -5/0 -5) he(1)[n] (0 -1/0 -1) ball(1)[n] (3 -4/2 -4) kutip(1)[v] (1 -2/0 -4) dia(1)[n] (0 -1/0 -1) the(1)[det] (2 -3/2 -3) 0 he pick the ball up 1 2 3 4 5 bola(1)[n] (2 -3/2 -4) itu(1)[det] (3 -4/3 -4) 0 dia kutip bola itu 1 2 2 M is[v](2) off(1)[adv] padam(1)[v] (2 -3+3 -4/0 -4) (2 -3/0 -3) 2 E lamp(1)[n] (1 -2/0 -2) the(1)[det] (0 -1/0 -1) 0 the lamp is off 1 2 3 4 Index. Stree (0 -5, 0 -4) (0 -1, 0 -1) (2 -4, 2 -4) (2 -3, 3 -4) Index. Snode (1 -2+4 -5, 1 -2) (0 -1, 0 -1) (3 -4, 2 -3) (2 -3, 3 -4) Index. Stree (0 -4, 0 -4) (0 -2, 0 -2) (0 -4, 0 -4) lampu(1)[n] (0 -1, 1 -2) (0 -1/0 -2) Index. Snode (2 -3+3 -4, 2 -3) itu(1)[det] (1 -2, 0 -1) (1 -2/1 -2) (0 -4, 0 -4) lampu itu padam 0 1 2 3 (0 -1, 1 -2) English sentence: He pick the ball up. Malay translation: Dia kutip bola itu. English sentence: The lamp is off. Malay translation: Lampu itu padam.
3 E 3 M turn(1)[v] on(1)[adv] (3 -4+4 -5/0 -5) Index. Stree (0 -5, 0 -4) (0 -3, 0 -3) (0 -1, 2 -3) (1 -2, 1 -2) Index. Snode bertukar(2)[v] (3 -4/0 -4) signal(2)[n] (2 -3/0 -3) isyarat(1)[n] (0 -1/0 -3) hijau(1)[adj] itu(1)[det] (3 -4+4 -5, 3 -4) the(1)[det] green(1)[adj] (2 -3, 0 -1) (1 -2/1 -2) (2 -3/2 -3) (1 -2/1 -2) (0 -1/0 -1) (0 -1, 2 -3) the green signal turn on Isyarat hijau itu bertukar (1 -2, 1 -2) 0 1 2 3 4 5 0 1 2 3 4 4 E 4 M drink (1)[v] (3 -4/0 -5) man (1)[n] (2 -3/0 -3) tea (1)[n] (4 -5/4 -5) the (1)[det] old (1)[adj] (0 -1/0 -1) (1 -2/1 -2) 0 the old man drink tea 1 2 3 4 Index. Stree (0 -5, 0 -5) (0 -3, 0 -3) (0 -1, 2 -3) (1 -2, 1 -2) (4 -5, 4 -5) minum (1)[v] (3 -4/0 -5) lelaki (1)[n] teh (1)[n] (0 -1/0 -3) (4 -5/4 -5) tua (1)[adj] itu (1)[det] (1 -2/1 -2) (2 -3/2 -3) 5 0 lelaki tua itu minum teh 1 2 3 4 5 The old man drinks tea. Malay translation: Lelaki tua itu minum teh. English sentence: Index. Snode (3 -4, 3 -4) (2 -3, 0 -1) (0 -1, 2 -3) (1 -2, 1 -2) (4 -5, 4 -5) English sentence: The green signal turn on. Malay translation: Isyarat hijau itu bertukar.
Source: Source the old man picks the green lamp up (1) pick[v] up[p] (2 -3+5 -6/0 -6) boy[n] (1 -2/0 -2) ball[n] (4 -5/3 -5) the[det] (0 -1/0 -1) the[det] (3 -4/3 -4) the boy pick the ball up 0 1 2 3 4 5 6 (3) is[v]off[adv] (2 -3+3 -4/0 -4) lamp[n] (1 -2/0 -2) the[det] (0 -1/0 -1) 0 the 1 lamp 2 is 3 off 4 (2) turn[v]on[adv] (3 -4+4 -5/0 -5) signal[n] (2 -3/0 -3) the[det] green[adj] (0 -1/0 -1) (1 -2/1 -2) 0 the 1 green 2 signal 3 turn 4 on 5 (4) drink[v] (3 -4/0 -5) man[n] (2 -3/0 -3) tea[n] (4 -5/4 -5) the[det] old[adj] (0 -1/0 -1) (1 -2/1 -2) 0 the 1 old 2 man 3 drink 4 tea 5 man[n] (2 -3/0 -3 ) the[det] old[adj] (0 -1/0 -1) (1 -2/1 -2) pick[v] (3 -4/ 0 -8 ) lamp[n] (6 -7/ 4 -7 ) the[det] green[adj] (4 -5/4 -5) (5 -6/5 -6) up[p] (7 -8/-)
Sub-synchronous SSTCs for the source sentence man[n] (2 -3/0 -3 ) the[det] old[adj] (0 -1/0 -1) (1 -2/1 -2) pick[v] (3 -4/ 0 -8 ) lamp[n] (6 -7/ 4 -7 ) the[det] green[adj] (4 -5/4 -5) (5 -6/5 -6) up[p] (7 -8/-) (1) man(1)[n] (2 -3/0 -3) lelaki (1)[n] (0 -1/0 -3) Index. Stree (0 -3, 0 -3) (0 -1, 2 -3) the(1)[det] old(1)[adj] tua (1)[adj] itu (1)[det] (1 -2, 1 -2) (0 -1/0 -1) (1 -2/1 -2) (2 -3/2 -3) Index. Snode (2 -3, 0 -1) the old man lelaki tua itu (0 -1, 2 -3) 0 1 2 3 (1 -2, 1 -2) (2) pick(1)[v] (3 -4/3 -4) 3 (3) pick kutip(1)[v] (3 -4/3 -4) 4 3 kutip 4 Index. Stree (3 -4, 3 -4) Index. Snode (3 -4, 3 -4) lamp(1)[n] (6 -7/4 -7) lampu(1)[n] (4 -5/4 -7) the(1)[det] green(1)adj] hijau(1)[adj] itu(1)[det] (5 -6/5 -6) (6 -7/6 -7) (5 -6/5 -6) (4 -5/4 -5) 4 the green lamp 5 6 (4) up(1)[p] (7 -8/7 -8) 7 up 8 7 4 lampu hijau itu 5 Index. Stree (7 -8, -) Index. Snode (7 -8, -) 6 7 Index. Stree (4 -7, 4 -7) (4 -5, 6 -7) (5 -6, 5 -6) Index. Snode (6 -7, 4 -5) (4 -5, 6 -7) (5 -6, 5 -6)
Selected closed example 1 E 1 M pick(1)[v] up(1)[p] (1 -2+4 -5/0 -5) he(1)[n] (0 -1/0 -1) kutip(1)[v] (1 -2/0 -4) dia(1)[n] (0 -1/0 -1) ball(1)[n] (3 -4/2 -4) itu(1)[det] (3 -4/3 -4) the(1)[det] (2 -3/2 -3) 0 he pick the ball up 1 2 3 4 bola(1)[n] (2 -3/2 -4) 5 0 dia kutip bola itu 1 2 3 4 Index. Stree (0 -5, 0 -4) (0 -1, 0 -1) (2 -4, 2 -4) (2 -3, 3 -4) Index. Snode (1 -2+4 -5, 1 -2) (0 -1, 0 -1) (3 -4, 2 -3) (2 -3, 3 -4) English sentence: He pick the ball up. Malay translation: Dia kutip bola itu. Sub-synchronous SSTCs derived from the example (1) he(1)[n] dia(1)[n] (0 -1/0 -1) 0 he 0 1 dia 1 (2) pick(1)[v] kutip(1)[v] (1 -2/0 -5) (1 -2/0 -4) pick kutip 1 2 Index. Stree (0 -1, 0 -1) Index. Snode (0 -1, 0 -1) Index. Stree (0 -5, 0 -4) Index. Snode (1 -2, 1 -2) (3) ball(1)[n] (3 -4/2 -4) the(1)[det] (2 -3/2 -3) the ball 2 (4) 3 4 up(1)[p] (4 -5/ -) up 4 5 bula(1)[n] (2 -3/2 -4) Index. Stree (2 -4, 2 -4) (2 -3, 3 -4) itu (1)[det] Index. Snode (3 -4/3 -4) (2 -3, 0 -1) bula itu 2 3 4 (3 -4, 2 -3) Index. Stree (- , -) Index. Snode (4 -5, -)
Sub-synchronous SSTCs. Example sentence Source sentence (1) man(1)[n] (2 -3/0 -3) lelaki (1)[n] (0 -1/0 -3) Index. Stree (0 -3, 0 -3) (0 -1, 2 -3) the(1)[det] old(1)[adj] tua (1)[adj] itu (1)[det] (1 -2, 1 -2) (0 -1/0 -1) (1 -2/1 -2) (2 -3/2 -3) Index. Snode (2 -3, 0 -1) the old man lelaki tua itu (0 -1, 2 -3) 0 1 2 3 (1 -2, 1 -2) (1) he(1)[n] dia(1)[n] (0 -1/0 -1) (2) pick(1)[v] (3 -4/3 -4) (2) pick(1)[v] kutip(1)[v] (1 -2/0 -5) (1 -2/0 -4) pick kutip 3 (3) pick kutip(1)[v] (3 -4/3 -4) 4 3 kutip 4 Index. Stree (3 -4, 3 -4) Index. Snode lampu(1)[n] (4 -5/4 -7) the(1)[det] green(1)adj] hijau(1)[adj] itu(1)[det] (5 -6/5 -6) (6 -7/6 -7) (5 -6/5 -6) (4 -5/4 -5) 4 the green lamp 5 6 (4) up(1)[p] (7 -8/7 -8) 7 up 8 7 4 lampu hijau itu 5 Index. Stree (7 -8, -) Index. Snode (7 -8, -) he 1 (3 -4, 3 -4) lamp(1)[n] (6 -7/4 -7) 0 6 7 Index. Stree (4 -7, 4 -7) (4 -5, 6 -7) (5 -6, 5 -6) Index. Snode (6 -7, 4 -5) (4 -5, 6 -7) (5 -6, 5 -6) 0 1 2 1 (3) ball(1)[n] (3 -4/2 -4) the(1)[det] (2 -3/2 -3) 2 (4) the ball 3 up(1)[p] (4 -5/ -) up 4 dia 5 4 Index. Stree (0 -1, 0 -1) Index. Snode (0 -1, 0 -1) 1 2 Index. Stree (0 -5, 0 -4) Index. Snode (1 -2, 1 -2) Index. Stree (2 -4, 2 -4) (2 -3, 3 -4) itu (1)[det] Index. Snode (3 -4/3 -4) (2 -3, 0 -1) (3 -4, 2 -3) bula itu bula(1)[n] (2 -3/2 -4) 2 3 4 Index. Stree (- , -) Index. Snode (4 -5, -)
Source part (2) Example part pick(1)[v] kutip(1)[v] Index. Stree (3 -4/3 -4) (3 -4, 3 -4) Index. Snode pick kutip 4 3 4 (3 -4, 3 -4) 3 (2) pick(1)[v] (1 -2/0 -5) 1 pick 2 kutip(1)[v] (1 -2/0 -4) 1 kutip 2 Replacement 1 E 1 M pick(1)[v] pick (1)[v] up(1)[p] Pick(1)[v] 1 -2 0 -5 (3 -4+4 -5/3 -4) (1 -2+4 -5/0 -5) (1 -2/0 -5) he(1)[n] ball(1)[n] (0 -1/0 -1) (3 -4/2 -4) the(1)[det] (2 -3/2 -3) he pick the ball up up he pick the ball up 0 -1 0 11 -2 2 2 -33 3 -4 44 -55 kutip(1)[v] (1 -2/0 -4) (3 -4/3 -4) (1 -2/0 -4) dia(1)[n] bola(1)[n] (0 -1/0 -1) (2 -3/2 -4) Index. Stree (0 -5, 0 -4) (0 -1, 0 -1) (2 -4, 2 -4) (2 -3, 3 -4) Index. Snode , 1 -2) (1 -2+4 -5, 1 -2) itu(1)[det] (1 -2 (0 -1, 0 -1) (3 -4/3 -4) (3 -4, 2 -3) dia kutip bola itu (2 -3, 3 -4) dia kutip bola itu 0 -1 1 -2 2 -3 3 -4 0 1 2 3 4 Index. Stree (0 -5, 0 -4) Index. Snode (1 -2, 1 -2)
Source part Example part Index. Stree (0 -3, 0 -3) (2 -3/0 -3) (0 -1, 2 -3) the(1)[det] old(1)[adj] tua (1)[adj] itu (1)[det] (1 -2, 1 -2) Index. Snode (0 -1/0 -1) (1 -2/1 -2) (2 -3/2 -3) (2 -3, 0 -1) (0 -1, 2 -3) the old man lelaki tua itu 0 1 2 3 (1 -2, 1 -2) (1) man(1)[n] lelaki (1)[n] (0 -1/0 -3) (1) he(1)[n] (0 -1/0 -1) 0 he dia(1)[n] (0 -1/0 -1) 0 1 dia 1 Index. Stree (0 -1, 0 -1) Index. Snode (0 -1, 0 -1) Replacement 1 E 1 E 1 M Index. Stree 1 M 1 M Index. Stree (0 -5, 0 -4) kutip(1)[v] pick(1)[v] up(1)[p] (3 -4/3 -4) (3 -4+4 -5/3 -4) (0 -1, 0 -1) (3 -4/3 -4)(0 -1, 0 -1) (3 -4+7 -8/3 -4) (2 -4, 2 -4) (1)[n] bola(1)[n] he(1)[n] dia(1)[n] bola(1)[n] (2 -3, 3 -4) ball(1)[n]diadia (2 -3, 3 -4) he(1)[n]ball(1)[n] lelaki(1)[n] he(1)[n] ball(1)[n] bola(1)[n] man(1)[n] (2 -3/2 -4) (0 -1/0 -1) (2 -3/2 -4) Index. Snode (3 -4/2 -4) (0 -1/0 -1) 0 -1 (3 -4/2 -4) (0 -1/0 -3) (0 -1/0 -1) (3 -4/2 -4) (2 -3/0 -3) (0 -1/0 -1) Index. Snode 1 E (1 -2+4 -5, 1 -2) itu(1)[det] the(1)[det] old(1)[adj] the(1)[det] tua (1)[adj] itu (1)[det] itu(1)[det] (0 -1, 0 -1) (2 -3/2 -3) (3 -4/3 -4) (0 -1, 0 -1) (0 -1/0 -1) (1 -2/1 -2) (2 -3/2 -3) (1 -2/1 -2) (3 -4/3 -4) (2 -3/2 -3) (3 -4/3 -4) (3 -4, 2 -3) he pick the ball up up dia kutip bola itu (2 -3, 3 -4) he pick the ball up dia kutip bola itu the old man pick the ball up lelaki tua itu kutip bola itu 0 -1 3 -4 2 -3 3 -4 7 -8 0 -1 3 -4 2 -3 3 -4 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6
1 E 1 M Index. Stree man(1)[n] (2 -3/0 -3) (0 -5, 0 -4) (0 -1, 0 -1) (2 -4, 2 -4) (2 -3, 3 -4) kutip(1)[v] (3 -4/3 -4) pick(1)[v] up(1)[p] (3 -4+7 -8/0 -8) lelaki(1)[n] (0 -1/0 -3) lamp(1)[n] (2 -3/0 -3) lampu(1)[n] (0 -1/0 -3) Index. Snode the(1)[det] old(1)[adj] the(1)[det] green(1)[adj] tua(1)[adj] itu(1)[det] hijau(1)[adj] itu(1)[det] (1 -2+4 -5, 1 -2) (0 -1, 0 -1) (0 -1/0 -1) (1 -2/1 -2) (2 -3/2 -3) (1 -2/1 -2) the old man pick the green lamp up 0 -1 1 -2 2 -3 3 -4 4 -5 5 -6 6 -7 7 -8 lelaki tua itu kutip lampu hijau itu lelaki hijau itu 0 -1 tua 1 -2 itu 2 -3 kutip 3 -4 lampu 4 -5 5 -6 6 -7 Generation The translation (3 -4, 2 -3) (2 -3, 3 -4) (0 -1, 0 -1) (3 -4, 2 -3) (2 -3, 3 -4) lelaki tua itu kutip lampu hijau itu The translation for the source sentence is generated from the synchronous SSTC the Malay part, which is the String in the SSTC.
EBMT General Problems üHow to utilize more than one example to translate one source sentence The construction of well-formed target language sentences from extracted fragments of a BKB. ülack of flexibility in representing translation relations between source and target substrings The treatment of wild linguistic phenomena, which are non-standard, e. g. crossed dependencies Our approach overcomes these problems
Source transf er sis he nt Sy An aly sis Transfer Approach to MT Target
The general Architecture for EBMT Source sentence Target sentence Find closest related SL examples Retrieve Corresponding TL examples For Source language correspondence Combination Target language For BKB
How. Substantial to Construct The Bilingual Knowledge Bank Reservation !!! (BKB) or (Example-Base)
ØThe Construction of a BKB Based on the Synchronous SSTC Based on Bitext Synchronous Parsing Technique üBi. Text: Text that is available in two languages. S: English The basic idea of example-based parsing is very simple: it is to find the corresponding representation for an input sentence based on the representations of similar sentences in the example-base. T: Malay Idea asas bagi penghuraian berasaskan-contoh adalah mudah: iaitu untuk mencari perwakilan yang sepadan bagi suatu ayat input berdasarkan perwakilan ayat yang serupa dalam pengkalan-contoh.
ØSchema Bilingual dictionary Sentence level Parsing & POS Tagging for the English source text Phrase level Bi-text Alignment Process Apple Pie Parser word level English source Malay target ( S ( NP. (. . ))) ( S ( VP …(. . ))) BKB Build the SSTC for Malay target text based on the SSTC Compile the APP output into for the English source text SSTC for the English source Synchronous using the word alignment text SSTC Editor English Malay source target
Bilingual dictionary Sentence level Phrase level Bi-text Alignment Process Apple Pie Parser word level English source Malay target ( S ( NP. (. . ))) ( S ( VP …(. . ))) BKB Synchronous SSTC English Malay source target SSTC Editor English Malay source target
Bitext World-level Mapping (Word Alignment) Real texts are noisy: - Fertility = A single word in the source sentence may correspond to zero, one, two or more words in the target sentence and vice versa. - crossed dependencies (distortion) = Where human translators change and rearrange material so the target output text will not flow well according to the order of the source text.
±n Context Window Word Alignment The correspondence between the source and the target is denoted by an interval attached to each subtext according to its offset in the text. S: English 0 The 1 basic 2 idea 3 of 4 example 5 - 6 based 7 parsing 8 is 9 very 10 simple 11: 12 It 13 is 14 to 15 find 16 the 17 correspondi ng 18 representation 19 for 20 an 21 input 22 sentence 23 based 24 on 25 the 26 repres entations 27 of 28 similar 29 sentences 30 in 31 the 32 example 33 -34 base 35. 36 T: Malay 0 Idea 1 asas 2 bagi 3 penghuraian 4 berasaskan 5 -6 contoh 7 adalah 8 mudah 9: 10 Iaitu 11 untuk 12 mencari 13 perwakilan 14 yang 15 sepadan 16 bagi 17 suatu 18 ayat 19 input 20 berdasarkan 21 perwakilan 22 ayat 23 yang 24 serupa 25 dalam 26 pengkalan 2728 contoh 29. 30
±n Context Window Word Alignment Find the TPCs between the source and the target. (Bilingual dictionary) Cognate words Computer Dice coefficient Dice = 2 prob(S, T) / [prob(S) + prob(T)] Komputer -The probabilities of S and T to occur in the text. -The probability of both to co-occur in the same bitext segment. Bilingual dictionary
±n Context Window Word Alignment Find out the chains for all possible TPCs for a source word. contoh(6 -7) Example(4 -5) contoh(28 -29) basic(1 -2) idea(2 -3) of(3 -4) example(4 -5) – (5 -6) based (6 -7) parsing (7 -8) bagi(2 -3) penghuraian(3 -4) berasaskan(4 -5) – (5 -6) contoh (6 -7) basic(1 -2) idea(2 -3) of(3 -4) example(4 -5) – (5 -6) based (6 -7) parsing (7 -8) – (27 -28) contoh(28 -29)
±n Context Window Word Alignment For every chain, calculate the weight W: len(seq): length of continuous sequence of words. len(gap): length of the gaps between the words in the chain. len(chain): length of the chain. contoh(6 -7) W=1. 39 contoh(28 -29) W=0. 60 Example(4 -5)
ØBitext Synchronous Parsing Technique S: English 0 The 1 basic 2 idea 3 of 4 example 5 - 6 based 7 parsing 8 is 9 very 10 simple 11: 12 It 13 is 14 to 15 find 16 the 17 correspondi ng 18 representation 19 for 20 an 21 input 22 sentence 23 based 24 on 25 the 26 repres entations 27 of 28 similar 29 sentences 30 in 31 the 32 example 33 -34 base 35. 36 T: Malay 0 Idea 1 asas 2 bagi 3 penghuraian 4 berasaskan 5 -6 contoh 7 adalah 8 mudah 9: 10 Iaitu 11 untuk 12 mencari 13 perwakilan 14 yang 15 sepadan 16 bagi 17 suatu 18 ayat 19 input 20 berdasarkan 21 perwakilan 22 ayat 23 yang 24 serupa 25 dalam 26 pengkalan 2728 contoh 29. 30 The basic idea of example-based parsing is very simple Idea asas bagi penghuraian berasaskan – contoh adalah mudah
Bilingual dictionary Sentence level Phrase level Bi-text Alignment Process Apple Pie Parser word level English source Malay target ( S ( NP. (. . ))) ( S ( VP …(. . ))) BKB Synchronous SSTC English Malay source target SSTC Editor English Malay source target
Apple Pie Parser (APP) üIt is a bottom-up probabilistic chart parser to find the parse tree for an input text (English). üIt was developed at New York University. üThe parser generates a syntactic tree in Penn. Tree. Bank bracketing. üIt is Free, and available to download with the source code. ü http: //cs. nyu. edu/cs/projects/proteus/sekine
Apple Pie Parser (APP) The basic idea of example-based parsing is very simple APP (S (NPL The basic idea) (PP of (NPL example-based parsing))) (VP is (ADJP very simple))) The representation structure and the POS for the source English is obtained
Bilingual dictionary Sentence level Phrase level Bi-text Alignment Process Apple Pie Parser word level English source Malay target ( S ( NP. (. . ))) ( S ( VP …(. . ))) BKB Synchronous SSTC English Malay source target SSTC Editor English Malay source target
Compile the APP output to SSTC structure (S (NPL The basic idea) (PP of (NPL example-based parsing))) (VP is (ADJP very simple))) S (Ø/0 -11) Tree NP (Ø/0 -8) NPL(1) (Ø/0 -3) VP (Ø/8 -11) PP(1) (Ø/3 -8) of The basic idea (3 -4/3 -4) (0 -3/0 -3) is (8 -9/8 -9) NPL(1) (Ø/4 -8) ADJP(1) (Ø/9 -11) Very simple (9 -11/9 -11) Example-based parsing (4 -8/4 -8) 0 the 1 basic 2 idea 3 of 4 example 5 -6 based 7 parsing 8 is 9 very 10 simple 11 String
Lexical Transfer The basic idea of example-based parsing is very simple Idea asas bagi penghuraian berasaskan – contoh adalah mudah S (Ø/0 -11) (Ø/0 -9) Tree NP (Ø/0 -8) (Ø/0 -7) NPL(1) (Ø/0 -3) (Ø/0 -2) VP (Ø/8 -11) (Ø/7 -9) PP(1) (Ø/3 -8) (Ø/2 -7) bagi of The Idea basic asas idea (3 -4/3 -4) (2 -3/2 -3) (0 -3/0 -3) (0 -2/0 -2) adalah is (8 -9/8 -9) (7 -8/7 -8) NPL(1) (Ø/4 -8) (Ø/3 -7) ADJP(1) (Ø/9 -11) (Ø/8 -9) Very mudah simple (9 -11/9 -11) (8 -9/8 -9) Penghuraian Example-based berasaskan-contoh parsing String (4 -8/4 -8) (3 -7/3 -7) asasbasic bagi 3 idea penghuraian adalah 8 mudah 9 0 idea 01 the 4 berasaskan 5 -6 contoh 12 2 3 of 4 example 5 -6 based 7 parsing 8 is 79 very 10 simple 11
Bilingual dictionary Sentence level Phrase level Bi-text Alignment Process Apple Pie Parser word level English source Malay target ( S ( NP. (. . ))) ( S ( VP …(. . ))) BKB Synchronous SSTC English Malay source target SSTC Editor English Malay source target
The synchronous SSTC editor. File Edit Correspondences Windows S(Ø/0 -11) NP VP (Ø/0 -8) (Ø/8 -11) NPL(1) PP(1) is ADJP(1) (Ø/0 -3) (Ø/3 -8) (8 -9/8 -9) (Ø/9 -11) of NPL(1) The basic (3 -4/3 -4) Very simple (Ø/4 -8) idea (9 -11/9 -11) (0 -3/0 -3) Example-based parsing (4 -8/4 -8) 0 the 1 basic 2 idea 3 of 4 example 5 – 6 based 7 parsing 8 is 9 very 10 simple 11 S(Ø/0 -9) NP VP (Ø/0 -7) (Ø/7 -9) NPL(1) PP(1) adalah ADJP(1) (Ø/0 -2) (Ø/2 -7) (7 -8/7 -8) (Ø/8 -9) bagi NPL(1) Idea asas (2 -3/2 -3) (Ø/3 -7) (0 -3/0 -3) mudah (8 -9/8 -9) Penghuraian berasaskan-contoh (3 -7/3 -7) 0 Idea 1 asas 2 bagi 3 penghuraian 4 berasaskan 5 – 6 contoh 7 adalah 8 mudah 9
Discussion Thank you…. .
- Slides: 44