Universal Networking Language Shalini Gupta 07305 R 02

  • Slides: 36
Download presentation
Universal Networking Language Shalini Gupta - 07305 R 02

Universal Networking Language Shalini Gupta - 07305 R 02

The Problem Large exploration of Data Linguistic barriers(Multilingualism) Web contents are mostly in English

The Problem Large exploration of Data Linguistic barriers(Multilingualism) Web contents are mostly in English and cannot be accessed without some proficiency in this language Though India forms large part of total population, the proportion of Internet Access is very low. Need for high speed translation to different languages

Solution: Machine Translation 2 approaches: Transfer based Works on specific pairs of languages Some

Solution: Machine Translation 2 approaches: Transfer based Works on specific pairs of languages Some text analysis on source language Some on target language Interlingua based Build a universal language Convert data to universal language De convert it back Needs only 2 N conversions opposed to N*(N-1) translations for transfer based

UNL: An Interlingua Language independent Knowledge Representation Vehicle for machine translation UNL solves “Information

UNL: An Interlingua Language independent Knowledge Representation Vehicle for machine translation UNL solves “Information Monopolies” problem English Interlingua Hindi (UNL) French Chinese

Outline Introduction UNL Components Some Controversial Issues in UNL Language Divergences between Hindi and

Outline Introduction UNL Components Some Controversial Issues in UNL Language Divergences between Hindi and English Conclusion

Introduction to UNL Proposed by the United Nations University Enables computers to process information

Introduction to UNL Proposed by the United Nations University Enables computers to process information and knowledge across the language barriers Replicates functions of natural languages in human communication Enables distributing, receiving and understanding multilingual information Represents information sentence by sentence

UNL Graph Each sentence is converted into a hyper graph Concepts as nodes Relations

UNL Graph Each sentence is converted into a hyper graph Concepts as nodes Relations as directed arcs Concepts are called Universal Words Word Knowledge represented by Universal Words (UWs) which are language independent Conceptual Knowledge captured by relating UWs through relations

Example: John eats rice with a spoon Universal Word Attribute Semantic Relations

Example: John eats rice with a spoon Universal Word Attribute Semantic Relations

UNL Expression John eats rice with a spoon {unl} agt(eat(icl>do). @entry. @present, John(iof>person) obj(eat(icl>do).

UNL Expression John eats rice with a spoon {unl} agt(eat(icl>do). @entry. @present, John(iof>person) obj(eat(icl>do). @entry. @present, rice(icl>food) ins(eat(icl>do). @entry. @present, spoon(icl>artifact). @indef {/unl}

Universal Word

Universal Word

Types of Universal Word Syntactic and semantic unit of UNL Represents a concept Represents

Types of Universal Word Syntactic and semantic unit of UNL Represents a concept Represents node in graph of UNL expression 2 classes: Unit concepts Basic UWs Restricted UWs Extra UWs Compound concepts: Scopes

Types of Universal Words(UWs) Basic UWs Bare headwords with no constraint list E. g.

Types of Universal Words(UWs) Basic UWs Bare headwords with no constraint list E. g. : house drink Restricted UWs Headwords with a constraint list Represents a more specific concept, or subset of concepts

Types of UWs (contd. . ) Constraint List restricts the range of the concept

Types of UWs (contd. . ) Constraint List restricts the range of the concept that a Basic UW represents E. g. : state(icl>country) state(icl>abstract thing) Extra UWs Special type of Restricted UW Denote concepts that are not present in English. Foreign-language words are used as Head Words E. g. : Bharatnatyam(icl>dance)

Compound Concepts Raju said that [he had opened the window] say (icl>do) agt @entry.

Compound Concepts Raju said that [he had opened the window] say (icl>do) agt @entry. @past obj open (icl>do) : 01 Raju (iof>person agt he @entry. @past @complete obj window (icl>obj)

Compound Concepts (contd. . ) Set of binary relations that are grouped together to

Compound Concepts (contd. . ) Set of binary relations that are grouped together to express a compound concept Interpreted as a whole Expressed by a scope in UNL expressions Raju said that [he had opened the window]. Part of the sentence within square brackets should be grouped Only when they are grouped together and considered as a whole unit can the correct interpretation be obtained.

Relations Relation of UNL is expressed as: E. g. John broke the window <relation>(<uw

Relations Relation of UNL is expressed as: E. g. John broke the window <relation>(<uw 1>, <uw 2>) <relation> is one of the relations defined in UNL <uw 1>, <uw 2> are universal words agt(break(icl>do). @entry. @past, John(iof>person)) obj(break(icl>do). @entry. @past, window(icl>thing)) 41 such relations have been defined

Attributes Describe subjectivity of sentence Enrich the description given by UWs and relations E.

Attributes Describe subjectivity of sentence Enrich the description given by UWs and relations E. g. Time with respect to the Speaker happened in the past : @past happening at present : @present will happen in future : @future John broke the window agt(break(icl>do). @entry. @past, John(iof>person))

UNL Knowledge Base Defines every possible relation between concepts Two important roles Defines semantics

UNL Knowledge Base Defines every possible relation between concepts Two important roles Defines semantics of Universal Words Gives linguistic knowledge of concepts E. g. The anchor wrote the script Linguistic Knowledge tells that anchor is a person Semantics tells that only a person can write a script (Anchor(of ship) can't do so)

Controversial Issues Meaning Representation Language: Should provide sufficient means to express knowledge. Should be

Controversial Issues Meaning Representation Language: Should provide sufficient means to express knowledge. Should be simple. Main expressive device of UNL is Restrictions New expressive means for describing UWs have been proposed.

Semantic Restriction UW: operator(icl>thing) Doesn't effectively separate the meaning 2 meanings long distance operator(icl>human)

Semantic Restriction UW: operator(icl>thing) Doesn't effectively separate the meaning 2 meanings long distance operator(icl>human) addition operator (icl>abstract thing) Hypernymy and Meronymy are mostly used for expressing restrictions Synonmy and antonymy can be used E. g. wealth(equ>richness), poor(ant>rich)

Argument Frame Restriction X borrows Y from Z for W All four arguments are

Argument Frame Restriction X borrows Y from Z for W All four arguments are needed to define the action of borrowing completely Example John borrowed $10000 for 3 years John has been borrowing money for 3 years UNL as a meaning representation language should have an ability to draw a distinction between the argument and non-argument links of predicates

Weakly Differentiated Relations Some relations seem to be weakly differentiated and therefore difficult to

Weakly Differentiated Relations Some relations seem to be weakly differentiated and therefore difficult to use consistently. E. g. gol (final state) – plt (final place) E. g. src (initial state) – plf (initial place) John went to Brussels can be described both with gol and plt difference is that gol characterizes Brussels as the final state of John, while plt – as the final place of the whole event

Redundant Relations Some relations seems to be based more on the semantic class of

Redundant Relations Some relations seems to be based more on the semantic class of UWs E. g. mod (modification) – man (manner) Difference between them boils down to the semantic class of the starting point of the relation answered politely (man) [to answer] a polite answer (mod) [an answer] Relations 'man' and 'mod' can be merged

Divergences between English and Hindi Constituent Order Divergence Jim is playing tennis. ��� ��

Divergences between English and Hindi Constituent Order Divergence Jim is playing tennis. ��� �� (S) (V) (O) (V) ���� ��� (O) (S) Adjunction Divergence The [living in Delhi] boy ������ ���� Preposition-Stranding Divergence Which shop did John go to? ���������

Divergences(contd. . ) Null Subject Divergence going-am ����� Pleonastic Divergence It is raining. ��

Divergences(contd. . ) Null Subject Divergence going-am ����� Pleonastic Divergence It is raining. �� ������ Conflational Divergence Jim stabbed him. ������� Promotional Divergence The play is on. ���� ��

Conclusion UNL is an Interlingua for Machine Translation Studied Components of UNL Controversial Issues

Conclusion UNL is an Interlingua for Machine Translation Studied Components of UNL Controversial Issues in UNL Divergences between English and Hindi

References Igor Boguslavsky. Some controversial issues of UNL: linguistic aspects. 2004. Shachi Dave and

References Igor Boguslavsky. Some controversial issues of UNL: linguistic aspects. 2004. Shachi Dave and Pushpak Bhattacharyya. Knowledge extraction from Hindi text, 2001. Shachi Dave, Jignashu Parikh, and Pushpak Bhattacharyya. Interlingua-based English. Hindi machine translation and language divergence. Machine Translation, 16(4): 251– 304, 2001.

References The universal networking language manual, www. undl. org. 2006. Zhu M. Uchida H.

References The universal networking language manual, www. undl. org. 2006. Zhu M. Uchida H. The universal networking language (UNL) specifications. Technical Report, 2005.

Thank You

Thank You

UNL System

UNL System

Knowledge Extraction from Hindi Text En. Converter is a language independent parser provides framework

Knowledge Extraction from Hindi Text En. Converter is a language independent parser provides framework for analysis Need to provide a lexicon and Analysis Rules Analysis Rule: (<PRE>). . . <LNODE> <RNODE> (<SUF 1>) (<SUF 2>) (<SUF 3>). . . <PRI> Lexicon Entry: [HW] {ID} ”UW” (ATTRIB 1, ATTRIB 2, . . . ) <FLG, FRE, PRI>;

Knowledge Extraction from Hindi Text Each Step: Morphological Analysis Decision Relation Lexical Attribute UNL

Knowledge Extraction from Hindi Text Each Step: Morphological Analysis Decision Relation Lexical Attribute UNL Attribute

Verbal Concepts Classes of predicates actions ( have an active initiator, Eg. kill) activities

Verbal Concepts Classes of predicates actions ( have an active initiator, Eg. kill) activities ( set of heterogeneous actions with common goal, Eg. trade) events (Have no agent, Eg. the bridge broke ) processes (Denote a situation that occupies a certain time span, Eg. the tree grows) states (Homogeneous, do not denote a change, Eg. hear, ache)

Classes of predicates properties (Differ from the states in that they are atemporal, Eg.

Classes of predicates properties (Differ from the states in that they are atemporal, Eg. blind, red) relations (Specify relation between two or more things, Eg. love, hate, ) In UNL, all verbal concepts group into three classes (icl>do) contains actions and activities (icl>occur) consists of events and processes (icl>be) composed of states, properties and relations

Adjectival Concepts All adjectival concepts are divided into two classes: predicative (aoj>thing) restrictive (mod>thing)

Adjectival Concepts All adjectival concepts are divided into two classes: predicative (aoj>thing) restrictive (mod>thing) This does not work well in some situations Eg. Wise Greeks diluted wine with water Restrictive interpretation: ‘Those Greeks who were wise diluted wine with water. Silly ones didn’t’. Non-restrictive (qualificative) interpretation: ‘Greeks were wise. They diluted wine with water’. Its restrictive vs qualificative

Should be applied to other modifiers also The students sitting in the corner are

Should be applied to other modifiers also The students sitting in the corner are waiting for the professor The students(, ) who are sitting in the corner(, ) are waiting for the professor. The students in the corner are waiting for the professor The phrase 'who are sitting' can be restrictive (‘those of the students who are sitting in the corner are waiting for the professor; others are not’) non-restrictive (‘the students are waiting for the professor; they are sitting in the corner’)