Multilingual Cataloguing of Product Information of Specific Domains

  • Slides: 22
Download presentation
Multilingual Cataloguing of Product Information of Specific Domains: Case Mkbeem System Aarno Lehtola, Jarno

Multilingual Cataloguing of Product Information of Specific Domains: Case Mkbeem System Aarno Lehtola, Jarno Tenni and Tuula Käpylä VTT Information Technology Contents: Motivation Mkbeem in a nutshell Multilingual Cataloguing Tool Meaning extraction Experiences of test users Future DEMO

VTT TIETOTEKNIIKKA Online Language Challenges for e. Commerce Native English speakers comprise less than

VTT TIETOTEKNIIKKA Online Language Challenges for e. Commerce Native English speakers comprise less than 9 % of the world population. "If I'm selling to you, I speak your language. If I'm buying, dann müssen Sie Deutsch sprechen". (Willy Brandt) Ref: Global Reach http: //www. glreach. com/

VTT TIETOTEKNIIKKA An Answer: MKBEEM and Multilingual e. Commerce Mediation MKBEEM Mediation System Customer

VTT TIETOTEKNIIKKA An Answer: MKBEEM and Multilingual e. Commerce Mediation MKBEEM Mediation System Customer language information retrieval & trading • Language adaptation via automatic HL translation and interpretation • Natural dialogues combining HL & navigation • Harmonised ontologies enabling localised views to products and trading contracts • EC FP 5 IST/HLT project in 2000 -2002, budget 4, 9 M€ • Goal: Develop intelligent knowledge-based key components (HLP & KRR) for applications in multilingual e. Commerce Multilingual cataloguing: write once, publish many Monolingual CP/SP User Transactions with contract adaptation CP/SP e. Com Service • Generic solutions proved by trials in Finnish, French and English in the domains of travel and mail-order sales • More information: www. mkbeem. com

VTT TIETOTEKNIIKKA Generic Architecture of Mkbeem Customer Content/Ser vice Provider CP Interface User Interface

VTT TIETOTEKNIIKKA Generic Architecture of Mkbeem Customer Content/Ser vice Provider CP Interface User Interface CP E-Commerce platform User Agent CP CP CP Agent Human Language Processing Server Domain Ontology Server CP Information System Trading Ontology Server Rational Agent Manager Agent MKBEEM System Manager Interface

VTT TIETOTEKNIIKKA Mkbeem: Bridging Languages via Language Neutral Ontologies Extracting Product Properties "Toppatakki. Muhkea

VTT TIETOTEKNIIKKA Mkbeem: Bridging Languages via Language Neutral Ontologies Extracting Product Properties "Toppatakki. Muhkea malli, olkapäissä vahvikkeet. Painonapeilla kiinnitetty huppu, jossa joustava nyöri. Vetoketjun alla suojalista. Kaksi kannellista taskua. . . Meaning extraction Machine translation User Information Request Proc. A brown jacket made of natural material Dialogue processing. . . Multilingual Product Data 1. Ontological Formula in CARIN: (c_colour)(X), (r_name)(X, brown), (c_product)(Y), (r_name)(Y, jacket), (c_material)(Z), 14 products found: (r_name)(Z, nat_mat). Beige winterjacket of wool 2. Ochre quilted jacket of cotton. . . Any further requirements? "Toppatakki. Muhkea malli. . . " "Quilted jacket. Puffy model with reinforcements on the shoulder. . . " jacket(X, quilted_jacket), model(X, puffy), part(X, Y, sleeves), property(Y, Z, reinforcement). . . Product Model Material Ontology Colour Ontology One with a hood

VTT TIETOTEKNIIKKA Mkbeem: Multilingual Cataloguing Tool • Starting point: • The new product belongs

VTT TIETOTEKNIIKKA Mkbeem: Multilingual Cataloguing Tool • Starting point: • The new product belongs to the supported product domains • Available a textual product description in one of the supported languages and a photograph • Basic functionalities: • Text checking • Property extraction • Product Categorisation • Machine Translation • NL Query Processing • Technical key challenge: • Formalising relationship of ontologies and HL and • Extracting meaning of input HL texts with respect to provided ontologies into the form of Ontological Formulas

VTT TIETOTEKNIIKKA Meaning Extraction: Example in Clothing Domain Long skirt with cargo pockets Jupe

VTT TIETOTEKNIIKKA Meaning Extraction: Example in Clothing Domain Long skirt with cargo pockets Jupe longue avec des poches battle-dress Pitkä hame, jossa reisitaskut (c_MKBEEM: 81007: clothing. Product)(H 6641), (r_name)(H 6641, H 6989), (c_MKBEEM: 83383: property)(H 6552), Concept (r_name)(H 6552, H 6889), (c_MKBEEM: 81011: part)(H 6730), (r_name)(H 6730, H 7295), (l_dependency)(H 6989, adj. Attr, H 6889), (l_dependency)(H 6989, prep. Attr, H 7295), (l_constituent)(H 6889, 0, long, [en, long, adj, nom, sg, property]), (l_constituent)(H 6989, 1, skirt, [en, skirt, noun, nom, sg, product]), (l_constituent)(H 7295, 4, cargo#pockets, [en, cargo#pocket, noun, nom, pl, prodpart]) Bindings Linguistic Dependencies & Lexical info

VTT TIETOTEKNIIKKA VTT’s implementation of HLP Services in Mkbeem Meaning Extractor Webtran MT System

VTT TIETOTEKNIIKKA VTT’s implementation of HLP Services in Mkbeem Meaning Extractor Webtran MT System OK or correction check. Text HL string Formula extract. Meaning Ontologic translate. Text Translated string Functions: Unifier: Text Correction S/W Webtran Dependency Parser Verification Linguistic Ontology KBs: Domain Lexica (Finnish/~4500 French/~1700 English/~1500 Fi->Fr/~1300 Fi->En/~1500) ALEs for MT (965 btw Finnish, French English) Concept Bindings Cone: Onto S/W Inference Product Model Colour Ontology Material Ontology Altogether: 307 concepts 1050 attributes 150 ALEs embedded Linguistic Services

VTT TIETOTEKNIIKKA Augmented Lexical Entries • Augmented Lexical Entries (ALE) rules (see MT Summit

VTT TIETOTEKNIIKKA Augmented Lexical Entries • Augmented Lexical Entries (ALE) rules (see MT Summit 99): • Bilingual or multilingual non-directed entries representing phrase and sentence structures and possibly their translation relations. • Both surface form entries and generalised rules • Possible to declare multidirectional entries • Declarative and intuitive formalism - to be used by translators • Uniform way of representing phenomena on different levels of language • Designed to be suitable for automated or machine supported language modelling (see SMC 99 paper on learning translation grammars) • Can be viewed as a forest of partial dependency parse trees • Near relationship obtainable to the corresponding conceptual structures (concept bindings to ontologies) • Lexicon • All the allowed words • Monolingual and bilingual entries

VTT TIETOTEKNIIKKA Meaning Extraction: A Product Ontology with ALEs Embedded

VTT TIETOTEKNIIKKA Meaning Extraction: A Product Ontology with ALEs Embedded

VTT TIETOTEKNIIKKA Syntax of ALEs augmented_lexical_entry : : = [ entry_name pattern. . opt_message

VTT TIETOTEKNIIKKA Syntax of ALEs augmented_lexical_entry : : = [ entry_name pattern. . opt_message opt_repair ] entry_name : : = name. number_index name : : = hierarchical_name_w_dots_betw_parts pattern : : = [ opt_language_id constituent_def. . ] opt_message : : = e | [ message string_w_opt_binding ] opt_repair : : = e | [ repair string_w_opt_binding ] constituent_def : : = constituent_def* constituent_def : : = constituent_def. . constituent_def : : = < constituent_def. . > constituent_def : : = opt_regent_mark opt_lexeme opt_binding opt_feature_constraint opt_language_id : : = e | ISO_std_lang_identifier | ~ ISO_std_lang_identifier : : = ee | en | fi | fr | se | Å opt_regent_mark : : = e | ^ opt_lexeme : : = e | lexeme | tag | name opt_binding : : = e | binding opt_feature_constraint : : = e | { feature. . } binding : : = ( variable_name ) | (^) feature : : = feature_value | property_type binding

VTT TIETOTEKNIIKKA Examples of ALEs - 1/3 • Basic word correspondence definition: • Specific

VTT TIETOTEKNIIKKA Examples of ALEs - 1/3 • Basic word correspondence definition: • Specific idiom correspondence: [footwear. word. 27 [se ^allväderskänga] [fi ^jokasäänkenkä] [en all weather ^shoe]] [price. tax. 4 [se inkl. ^moms tag_price(X)] [fi sis. ^alv tag_price(X)] [en incl. ^VAT tag_price(X)]] • Generalised ALE, e. g. "shirt of 100% cotton” [cloth. material. composition [fi ^(A){cloth. Prod} tag_percentage(X) (B){textile. Material ptv}] [fr ^(A){cloth. Prod} en tag_percentage(X) (B){textile. Material}] [en ^(A){cloth. Prod} of tag_percentage(X) (B){textile. Material}] [se ^(A){cloth. Prod} av tag_percentage(X) (B){textile. Material}]

VTT TIETOTEKNIIKKA Examples of ALEs - 2/3 • Semantical and grammatical restrictions, e. g.

VTT TIETOTEKNIIKKA Examples of ALEs - 2/3 • Semantical and grammatical restrictions, e. g. agreement in “miellyttävä pusero” or “miellyttävää puseroa” (“comfortable blouse”) [cloth. property. 1 [se (A){adj cloth. Prop gender(B) number(B)} ^(B){noun cloth. Prod}] [fi (A){adj cloth. Prop case(B) number(B)} ^(B){noun cloth. Prod}] [en (A){adj cloth. Prop} ^(B){noun cloth. Prod}] ] • An iterative phrase, obs! tree flattening: [cloth. property. 2 [se property. expr{cloth. Prop} ^(B){cloth}] [fi property. expr{cloth. Prop} ^(B){cloth}] [en property. expr{cloth. Prop} ^(B){cloth)] ] [property. expr. 1 [se (A){adj prop gender(^) number(^)} ] [fi (A){adj prop number(^) case(^)} ] [en (A){adj prop} ] ] [property. expr. 2 tag_comma property. expr. 3]] [property. expr. 3 [property. expr. 1 {conj. AND} property. expr. 1]]

VTT TIETOTEKNIIKKA Examples of ALEs - 3/3 • Negative Instances - Correction ALEs [correct.

VTT TIETOTEKNIIKKA Examples of ALEs - 3/3 • Negative Instances - Correction ALEs [correct. ellos. 3 [~se kardborrstängning(A)] [~se kardborreförslutning(A)] [~se kardborrknäppning(A)] [~se kardborreknäppning(A)] [message Use the correct synonym “kardborrestängning” instead of word(A)] [repair kardborrestängning(A)]

VTT TIETOTEKNIIKKA Meaning Extraction Process Input phrase Set of CARIN formulas Syntactico-semantic analysis Inference

VTT TIETOTEKNIIKKA Meaning Extraction Process Input phrase Set of CARIN formulas Syntactico-semantic analysis Inference of CARIN formulas Lexical Analysis Syntactic translation into CARIN Set of tokens Dependence Analysis Subset of refined semantic graphs Set of syntactic dependence trees Concept Matching & Verification Set of approved lexicalsemantic graphs with concepts identified Refining semantics in particular themes, e. g. colors, materials, distances

VTT TIETOTEKNIIKKA Meaning Extraction Process Example Input phrase Set of CARIN formulas Syntactico-semantic analysis

VTT TIETOTEKNIIKKA Meaning Extraction Process Example Input phrase Set of CARIN formulas Syntactico-semantic analysis Inference of CARIN formulas musta hame, jossa halkio ja taskut Lexical une jupe noire avec fente et poches Analysis a black skirt with split and pockets (c_MKBEEM: 81098: colour)(H 1017), (r_name)(H 1017, H 641), Set of tokens (c_MKBEEM: 84731: clothing. Product)(H 984), (r_name)(H 984, H 684), (c_MKBEEM: 81011: part)(H 951), (r_name)(H 951, H 813), (c_MKBEEM: 81011: part)(H 918), Dependence (r_name)(H 918, H 899), (l_dependency)(H 684, adj. Attr, H 641), Analysis (l_dependency)(H 684, prep. Attr, H 813), (l_dependency)(H 684, prep. Attr, H 899), (l_constituent)(H 641, 0, musta, [fi, colour, musta, adj, nom, sg]), Set of syntactic (l_constituent)(H 684, 1, hame, [fi, product, hame, noun, nom, sg]), dependence trees (l_constituent)(H 727, 2, tag_comma, [fi]), (l_constituent)(H 770, 3, jossa, [fi, jossa, pron, ine, sg]), Concept (l_constituent)(H 813, 4, halkio, [fi, prodpart, halkio, noun, nom, sg]), Matching (l_constituent)(H 856, 5, ja, [fi, conj, ja, coord_c]), & Verification (l_constituent)(H 899, 6, taskut, [fi, prodpart, taskut, noun, nom, pl]) Syntactic translation into CARIN (c_product)(H 1606), (r_name)(H 1606, skirt), Subset of refined (c_colour)(H 1573), semantic (r_name)(H 1573, black), graphs (c_part)(H 1540), (r_name)(H 1540, split), (c_part)(H 1506), (r_name)(H 1506, pocket) Set of approved lexicalsemantic graphs with concepts identified Refining semantics in particular themes, e. g. colours, materials, distances

VTT TIETOTEKNIIKKA Cataloguing Tool Testing by End-Users • Goals: • Proof of concept (“Swiss

VTT TIETOTEKNIIKKA Cataloguing Tool Testing by End-Users • Goals: • Proof of concept (“Swiss army knife of a cataloguer”) • Usability in real working environment • Ellos' test group consisted of 8 persons (translators, cataloguers and call-centre workers): • familiar with Internet: 5 yes, 1 almost yes, 2 yes at home • languages used: 8 Finnish, 6 English, 4 Swedish, 1 French • familiar w. catalogue maintenance: 6 yes, 2 no • Schedule: • Short training and preliminary interviews on August 30, 2002 • Interviews of experiences and summary of the results ready by October 14, 2002

VTT TIETOTEKNIIKKA . . . Trial experiences of the Ellos test group • Cataloguing

VTT TIETOTEKNIIKKA . . . Trial experiences of the Ellos test group • Cataloguing tool considered to be useful: • cataloguing process as a whole was seen as an easy and efficient way of producing and classifying product information • each of the main features was considered good • very important: semi-automatic translation into target languages • property extraction and inference with colours and materials seen as important in bringing value-adding services to customers • helps in producing consistent and uniform information • can make the working process faster and reduce the amount of manual, repeated routine procedures • KB management tools considered suitable to their task • Reported difficulty: • occasionally long response times => boring of the user + e. g. repeating queries • e. g. "hourglass" or provision of partial results could bring quick help • will be eventually solved by continued product development

VTT TIETOTEKNIIKKA MT Part (Webtran) in Production Use at Ellos since 2000 Ellos Sweden

VTT TIETOTEKNIIKKA MT Part (Webtran) in Production Use at Ellos since 2000 Ellos Sweden Ellos Finland Mac Quark. XPress Cataloguer Catalogue author Mac Quark. XPress Swedish Source DB Language Modeller Localised DB Finnish Automatic Sw -> Fi Translation PC Server Webtran Machine Translation Software About 2000 translated catalogue pages and 10000 -15000 product descriptions per year Benchmark by CSC Inc. reports over 30% time savings after one year of use Language technology solutions are necessary to embed into business processes and IT infrastructure

VTT TIETOTEKNIIKKA Work Needed for Adding Domain and Languages • Marginal cost of adding

VTT TIETOTEKNIIKKA Work Needed for Adding Domain and Languages • Marginal cost of adding a new domain or a new language is reasonable with respect to the added-value gained • Based on experiences from modelling vacation cottage domain to the system (fi, fr, en) we have estimated that introducing a comparable new domain would require: • semantic-lexicon: 2 man-months • translation and meaning extraction rules: 1 man-month • product models: 2 -4 man-weeks • We also estimate that adding a language to a pre-existing domain would need: • semantic-lexicon: 1 -2 man-month • translation and meaning extraction rules: 2 -4 man-week • product models: 1 man-week

VTT TIETOTEKNIIKKA Future Development Recommendations • Further development of could focus on the following

VTT TIETOTEKNIIKKA Future Development Recommendations • Further development of could focus on the following issues: • information request processing dialogues: • question answering capabilities (e. g. qualitative questions about the goods selection) • proper way of handling null queries (e. g. graceful relaxation of the search constraints based on the ontology models and the actual goods selection) • new languages to the system: Russian, Norwegian, Estonian, German. . . • user-friendlier ways for the acquisition and maintenance of language models and product models (knowledge acquisition bottleneck): machine learning • special requirements of mobile terminals (e. g. automatic text abstraction)

VTT TIETOTEKNIIKKA DEMO

VTT TIETOTEKNIIKKA DEMO