Human Translation Machine Translation Natural Language Processing NLP

  • Slides: 61
Download presentation
Human Translation - Machine Translation Natural Language Processing (NLP) and Translation Anca Christine Pascu

Human Translation - Machine Translation Natural Language Processing (NLP) and Translation Anca Christine Pascu Université de Bretagne Occidentale, Lab. STICC, Brest, France

Outline Cognition – Language – Translation The Natural Language Processing (NLP) and Translation Modelling

Outline Cognition – Language – Translation The Natural Language Processing (NLP) and Translation Modelling in Translation Computational Logic and Translation Computation and Translation Concepts and Objects in Translation The Text Structure The Lattice Structure of a Text Formal Concept Analysis and the Text Structure Human Translation – Machine Translation A. P. Genova, May 2015 2

Cognition – Language – Translation Some Basic Ideas A. P. Genova, May 2015 3

Cognition – Language – Translation Some Basic Ideas A. P. Genova, May 2015 3

G. Frege, Nachgelassene Schriften, Hamburg, Meiner, 1969 in Desclés, J-P. (1998), « Les Langues

G. Frege, Nachgelassene Schriften, Hamburg, Meiner, 1969 in Desclés, J-P. (1998), « Les Langues sont-elles des représentations du monde » , Essais sur le langage, logique, et sens comun, Editions universitaires, Fribourg Dédié à Evandro Agazzi A. P. Genova, May 2015 4

It is true that we can express the same meaning (tought) in different languages;

It is true that we can express the same meaning (tought) in different languages; but the psychologic trappings (harness), the tought dressing will be osten different. That is why, the foreiner languages learning is useful for the education in logic. We learn to better distinguish the verbal peel from the kernel to which it is organically linked in any language. This is how the differences between natural languages can facilitate our apprehension of that which is logic. G. Frege, Nachgelassene Schriften, Hamburg, Meiner, 1969 (Posthumous Writings) in Desclés, J-P. (1998), « Les Langues sont-elles des représentations du monde » , Essais sur le langage, logique, et sens comun, Editions universitaires, Fribourg Dedicate to Evandro Agazzi A. P. Genova, May 2015 5

Cognition – Language - Translation K. Cognition: a set of processes related to knowledge:

Cognition – Language - Translation K. Cognition: a set of processes related to knowledge: attention, memory, psychology judgement, reasoning, « computation » , problem solving, decision making logic, computer science comprehention and production of language linguistics, psychology A. P. Genova, May 2015 6

Attention Psychology Memory Reasoning Judgement Cognition Computation Logic, CS Problem solving Language comprehention Language

Attention Psychology Memory Reasoning Judgement Cognition Computation Logic, CS Problem solving Language comprehention Language production A. P. Genova, May 2015 Decision makong Linguistics, Psychology 7

Some Questions about Language and Cognition Natural languages are they representations of the world

Some Questions about Language and Cognition Natural languages are they representations of the world ? Each natural language can projects itself on the external world ? Each natural language can construct its own cognitive representations ? Do natural languages refer to a universal system of mental representations ? Jean-Pierre Desclés, « Les Langues sont-elles des représentations du monde » , Essais sur le langage, logique, et sens comun, Editions universitaires, Fribourg, 1998. A. P. Genova, May 2015 8

Three Epistemological Hypotheses Relativistic hypothesis – Saphir-Whorf (Whorf, 1966); Anti-relativistic hypothesis – Fodor (Fodor,

Three Epistemological Hypotheses Relativistic hypothesis – Saphir-Whorf (Whorf, 1966); Anti-relativistic hypothesis – Fodor (Fodor, 1975) Shaumyan (Shaumyan, 1977) Anti-anti-relativistic hypothesis – Desclés (Desclés, 1998 ) A. P. Genova, May 2015 9

Translation general schema SOURCE Language A. P. Genova, May 2015 Trasfert TARGET Language 10

Translation general schema SOURCE Language A. P. Genova, May 2015 Trasfert TARGET Language 10

Vauquois Triangle A. P. Genova, May 2015 11

Vauquois Triangle A. P. Genova, May 2015 11

Natural Language Processing (NLP) and Translation A. P. Genova, May 2015 12

Natural Language Processing (NLP) and Translation A. P. Genova, May 2015 12

Linguistics - Logic Natural Language –Language Linguistics: Lexis, Morphology, Syntaxe, Semantics – Discourse -

Linguistics - Logic Natural Language –Language Linguistics: Lexis, Morphology, Syntaxe, Semantics – Discourse - Text Logic: Hypoteses, Inferences, Conclusions –Reasonning Inferences: Deduction, Induction, Abduction Meaning Item (Unit) – Translation Item (Unit) (Ballard, 2004) Ordered Structure of a Text : Argumentatif Structure, Descriptif Structure A. P. Genova, May 2015 13

NLP Fields via Linguistics Lexical level Errors detection and correction Automatic documentation, indexing, search

NLP Fields via Linguistics Lexical level Errors detection and correction Automatic documentation, indexing, search engine Morphological level Morphologic annotation Syntactic level Grammars and parsers Semantic level Automatic processing of the meaning Automatic text comprehention Machine translation A. P. Genova, May 2015 14

NPL fields via applications Automatic Annotation of Corpora Morphologic annotation Semantic annotation Text Mining;

NPL fields via applications Automatic Annotation of Corpora Morphologic annotation Semantic annotation Text Mining; Indexing Automatic summarizing Text Generation Machine Translation: Automatic translation Computer-Assisted Translation A. P. Genova, May 2015 15

Definition Natural Language Processin (NLP) : multidisciplinary field studying a set of: Theories (linguistics,

Definition Natural Language Processin (NLP) : multidisciplinary field studying a set of: Theories (linguistics, mathematic, logic. . ); Methods (procedures, algorithmes. . ); Computer Science Systems (languages, procedures. . . ) For analysis-synthesis in natural languages solving problems related to language and natural languages A. P. Genova, May 2015 16

Lexical Level Word Processing Spell Checker Lexical Labeling: word labeling with linguistic labels Concordancers:

Lexical Level Word Processing Spell Checker Lexical Labeling: word labeling with linguistic labels Concordancers: a computer program searching for a word all its occurrences in a text with their contexts (http: //ecolore. leeds. ac. uk/xml/materials/overview/tools/c oncordancer. xml? lang=fr) Concordancers are used to build linguistic corpora La La forme du mot : lemme, forme fléchie. . . Lemmatizers : lemma –inflected form A. P. Genova, May 2015 17

Syntactic Level Grammars and Parsers The techniques of analysis are almost the same as

Syntactic Level Grammars and Parsers The techniques of analysis are almost the same as these used in Formal Languages. Formal Grammar = a system of rules which allow, starting from a vocabulary : to analyse a string to generate a string Formal Language = finite set of words Word = concatenated string of elements of a vocabulary. A. P. Genova, May 2015 18

Grammars and Parsers Types of Formal Grammars Chomsky’s classification: L 3⊂ L 2⊂ L

Grammars and Parsers Types of Formal Grammars Chomsky’s classification: L 3⊂ L 2⊂ L 1⊂ L 0 ; Categorial Grammar (Grammaires catégorielles) (CG) Lexical Functional Grammars (Grammaires lexicales fonctionnelles) (LFG) Generalized Phrase Structure Grammar (Grammaires syntagmatiques généralisées) (GPSG) Tree Adjoint Grammar (Grammaires d'arbres adjoints) (TAG) Head Phrase Structure Grammar (Grammaires syntagmatiques guidées par les têtes) (HPSG) Dependency Grammar (Grammaires de dépendences) (DG) A. P. Genova, May 2015 19

Grammars and Parsers The steps of a syntactic analysis: Segmentation (tagger) ; Lemmatisation (identifying

Grammars and Parsers The steps of a syntactic analysis: Segmentation (tagger) ; Lemmatisation (identifying words in their canonic form) Labeling (identifying the morpho-syntactic category) La relation Syntax – Semantics : Surface Structure – Deep structure Typing Lexical Units (Categorial grammars). A. P. Genova, May 2015 20

Example of CG Jean aime N (SN)/N N Marie Types : N, S basic

Example of CG Jean aime N (SN)/N N Marie Types : N, S basic types (SN)/N derived type A. P. Genova, May 2015 21

CG Rules Right Application: OPER : T 1/T 2 OP : T 2 >

CG Rules Right Application: OPER : T 1/T 2 OP : T 2 > (OPER OP) : T 1 Left Application: OPER : T 1T 2 OP : T 2 < (OPER OP) : T 1 A. P. Genova, May 2015 22

Analysis : Jean aime Marie N (SN)/N N SN S A. P. Genova, May

Analysis : Jean aime Marie N (SN)/N N SN S A. P. Genova, May 2015 > < 23

Computer Text Comprehention Meaning problem: there are two main positions in the formalisation of

Computer Text Comprehention Meaning problem: there are two main positions in the formalisation of the meaning: An independent linguistic level The interdependence between the linguistic level and the level of mind (which implies the degree of dependence) A. P. Genova, May 2015 24

Computer Text Comprehention and Automatic Processing Semantics: Verifunctionel (truth conditions); Intensional (based on corresponding

Computer Text Comprehention and Automatic Processing Semantics: Verifunctionel (truth conditions); Intensional (based on corresponding concepts); Extetional (based on corresponding objets) ; Componential (word decomposition into primitive units of meaning Procedural (an expression is a procedure containing a set of actions); Argumentative (the chain of speech acts). A. P. Genova, May 2015 25

Computer Text Comprehention and Automatic Processing Structural Approaches of the Text Grammars (D. Rumelhart,

Computer Text Comprehention and Automatic Processing Structural Approaches of the Text Grammars (D. Rumelhart, 1975): Story = Exposition + Theme + Intrigue + Resolution Rhetorical Structure Theory (W. Mann, S. Thompson, 1987): A text is a set of units related by relations A. P. Genova, May 2015 26

Computer Text Comprehention and Automatic Processing Text Thematic Analysis: Analysis based on knowledge representation

Computer Text Comprehention and Automatic Processing Text Thematic Analysis: Analysis based on knowledge representation (semantic network, concept maps); Analysis using statistic tools. A. P. Genova, May 2015 27

Computer Text Comprehention and Automatic Processing Concept maps http: //en. wikipedia. org/wiki/Concept_ map WORDNET

Computer Text Comprehention and Automatic Processing Concept maps http: //en. wikipedia. org/wiki/Concept_ map WORDNET http: //wordnet. princeton. edu Ontology = a network of objects and concepts related by relations; it is specific to a domain) A. P. Genova, May 2015 28

Computer Text Comprehention and Automatic Processing Argumentative Structure of a Text: the text is

Computer Text Comprehention and Automatic Processing Argumentative Structure of a Text: the text is organise in «argumentation units» Hypothesis Conclusion Rules of inference Elements outside of text A. P. Genova, May 2015 29

Semantic Annotation Text Annotation: labeling the text accordig to a set of categories a

Semantic Annotation Text Annotation: labeling the text accordig to a set of categories a priori defined. Semantic Annotation: categories are semantic classes (classes of meaning based on relations). Causality Defintion Utterance Quotation A. P. Genova, May 2015 30

Problems in Translation related to Modelling for Machine Translation A. P. Genova, May 2015

Problems in Translation related to Modelling for Machine Translation A. P. Genova, May 2015 31

Translation unit Translation Unit (T U) (Balard, 2004): elementary unit of meaning in source

Translation unit Translation Unit (T U) (Balard, 2004): elementary unit of meaning in source language (Ls) which can be tranfered in the target language (Lt). Computer Science: the form of the source file after it is passed by C-preprocessor – in this case the output is deterministic and it depends only of the input and the rules. Translation: A pair (TUs-TUt) with the property that it is an « equivalence » between TUs and TUt. It depends on: Concepts, Sentence, phrase, paragraphe A. P. Genova, May 2015 32

Concepts, concept network, ontologies Concept (C) : Set of specific features (more primitive than

Concepts, concept network, ontologies Concept (C) : Set of specific features (more primitive than the notion) (Int C) ; The concept is expressed in a natural language by a word ; Some authors denote this pair by term (T). We consider it as a concept with its «language code» (the word). C = (Int C, W). A. P. Genova, May 2015 33

Concepts, concept network, ontologies The concept in a language is dependent of it, i.

Concepts, concept network, ontologies The concept in a language is dependent of it, i. e. of the cognitive representations in this language Concepts are organised in networks They have not the same status (position) The network in a language is different of the network in other (Desclés, 2006) Int C as a network (Desclés, Pascu, 2011): A. P. Genova, May 2015 34

Two intensions of the same concept Int s Int c. . . quart. .

Two intensions of the same concept Int s Int c. . . quart. . . . surveiller officier . . . quarter. . officer . . . officier de quart to watch. . . officer of the watch Il est logique d'interpréter cette assertion par. . . It makes sense to Il est logique d'interpréter cette assertion par. . . interpret this statement by. . . A. P. Genova, May 2015 35

Examples Computer Science: cloud computing – traitement des données hautement distribuées Mathematics: rough set

Examples Computer Science: cloud computing – traitement des données hautement distribuées Mathematics: rough set – ensemble approximatif (ensemble grossier) E Int E Ext E Fr E A. P. Genova, May 2015 36

The Logic of Determination of Objects (LDO) Concepts . . Links between concepts –

The Logic of Determination of Objects (LDO) Concepts . . Links between concepts – global network Inheritence –comprehension relation A. P. Genova, May 2015 37

The Logic of Determination of Objects (LDO) Objects Links between objects – local network

The Logic of Determination of Objects (LDO) Objects Links between objects – local network Determination –relation between objects σ A. P. Genova, May 2015 38

The Logic of Determination of Objects (LDO) The link between objects and concepts ordered

The Logic of Determination of Objects (LDO) The link between objects and concepts ordered set - filter f--- f ordered set - ideal A. P. Genova, May 2015 39

FORMAL CONCEPT ANALYSIS (FCA) A. P. Genova, May 2015 40

FORMAL CONCEPT ANALYSIS (FCA) A. P. Genova, May 2015 40

FCA-exemple A 1 A. P. Genova, May 2015 o 1 1 o 2 1

FCA-exemple A 1 A. P. Genova, May 2015 o 1 1 o 2 1 o 3 1 o 4 1 A 2 A 3 1 1 1 41

FCA OBJ –the set of objects ATT – the set of attributes R –

FCA OBJ –the set of objects ATT – the set of attributes R – binary relation between OBJ and ATT K = (OBJ, ATT, R) – formal context O ⊆ OBJ: O↑ is the set of all attibutes commun to all objects in O A ⊆ ATT: A↓ is the set of all objects commun to all attributes in A A. P. Genova, May 2015 42

Formal Concept: (Ext, Int) such that : Ext↑ = Int↓ = Ext Subconcept –

Formal Concept: (Ext, Int) such that : Ext↑ = Int↓ = Ext Subconcept – superconcept (A 1, B 1)<= (A 2, B 2) iff A 1⊆ A 2 (B 2 ⊆B 1) � � � A. P. Genova, May 2015 43

Example Concepts Contexte formel : (OBJ, ATT, R) C 1 = ({o 1, o

Example Concepts Contexte formel : (OBJ, ATT, R) C 1 = ({o 1, o 3}, {A 1, A 2}) C 2 = ({o 1, o 3}, {A 1, A 2, A 3}) C 3 = ({o 1, o 4}, {A 1, A 3}) C 4 = ({o 1, o 2, o 3, o 4 }, {A 1}) C 5 = ({o 1, o 3}, {A 2}) C 6 = ({o 1, o 3, o 4}, {A 3}) A. P. Genova, May 2015 44

Galois Lattice Two ordered sets: (OBJ, <OBJ), (ATT, <ATT) Two mappings: φ: OBJ ATT,

Galois Lattice Two ordered sets: (OBJ, <OBJ), (ATT, <ATT) Two mappings: φ: OBJ ATT, ψ: ATT OBJ such that If o 1<OBJ o 2 then φ(o 1) >ATT φ(o 2) If A 1<ATT A 2 then ψ (o 1) >ATT ψ (o 2) o <OBJ ψ(φ(o)) and A <ATT φ(ψ(A)) A. P. Genova, May 2015 45

The Context Lattice A 1 o 1, o 2, o 3, o 4 A

The Context Lattice A 1 o 1, o 2, o 3, o 4 A 1, A 2 o 1, o 3 ∅ o 1, o 2, o 3, o 4 A 2 o 1, o 3 A 1, A 3 o 1, o 3, o 4 A 2, A 3 o 1, o 3 A 1, A 2, A 3 o 1, o 3 A. P. Genova, May 2015 46

The Great Gatsby – the last paragraphe A. P. Genova, May 2015 47

The Great Gatsby – the last paragraphe A. P. Genova, May 2015 47

P 1 P 2 P 3 O 1 1 O 2 1 O 3

P 1 P 2 P 3 O 1 1 O 2 1 O 3 1 O 4 1 P 5 P 6 P 7 1 1 1 O 6 1 O 7 1 1 O 9 1 O 10 1 O 11 1 1 O 13 1 O 14 1 O 15 1 O 16 A. P. Genova, May 2015 P 9 1 O 8 O 17 P 8 1 O 5 O 12 P 4 1 1 48

∅ P 1 1, 2, 3, 4 P 1 P 2 1, 4 P

∅ P 1 1, 2, 3, 4 P 1 P 2 1, 4 P 2 P 3 1, 4, 5, 6, 7 7, 8, 9, 10, 11 P 1 P 3 1 P 1 P 2 P 3 1 P 4 2, 9, 11, 12, 17 P 1 P 4. . . P 2 P 3. . . 2 1, 7 P 5 10, 13, 14 P 7 P 9 2, 9, 17 16, 17 P 8 P 6 10, 15 ∅ P 3 P 4. . . P 4 P 7. . . P 7 P 9 9, 11 2, 9, 17 17 P 1 P 4 P 7 P 3 P 4 P 7 2 9. . . . P 4 P 7 P 9 17 . . . . P 1 P 2 P 3 P 4 P 5 P 5 P 7 P 8 P 9 ∅ A. P. Genova, May 2015 49

Interpretation No differeces between the two lattices The idea of « the pursuit of

Interpretation No differeces between the two lattices The idea of « the pursuit of happinness » A. P. Genova, May 2015 50

Applications of the FCA Model to Translation Object Attributes Independent/Together Semantic classes Segments of

Applications of the FCA Model to Translation Object Attributes Independent/Together Semantic classes Segments of text Independent Segments of text Semantic classes Together A. P. Genova, May 2015 51

 Conclusions about FCA It gives the lattice structure of a text depending of

Conclusions about FCA It gives the lattice structure of a text depending of the choice of objects and attributes The lattice structure can be used to model the translation unit and to implement it in a translation engine The choice of objects: semantic classes style elements The choice of attributes: Segments of text; type of segmentation To apply FCA model in an appropriate manner to a corpus of texts A. P. Genova, May 2015 52

Human Translation-Machine Translation A. P. Genova, May 2015 53

Human Translation-Machine Translation A. P. Genova, May 2015 53

Translation Engine Types Rules Based - Grammars Learning-Model Based - Statistics A. P. Genova,

Translation Engine Types Rules Based - Grammars Learning-Model Based - Statistics A. P. Genova, May 2015 54

DISSCUSSION Modelling Define : Translation Unit – Meaning Unit and their Computer Model Transfer

DISSCUSSION Modelling Define : Translation Unit – Meaning Unit and their Computer Model Transfer Rules based on these primitives Linguistic Architecture versus Computer Architecture – to give a degree of unification Architecture Translation Systems containing: Semantic Annotator Key Word Searcher Domain Ontology of Source Language – Target Language Appropriate Tools for Translation Data Mining A. P. Genova, May 2015 55

References BALLARD M. , (2004), « La théorisation comme structuration de l’action du traducteur

References BALLARD M. , (2004), « La théorisation comme structuration de l’action du traducteur » , in La Linguistique, n. 40, Linguistique et traductologie, 2004/1, pp. 51 -65. http: //www. cairn. info/revue-la-linguistique-2004 -1 -page 51. htm. BAKER M. , (1992), In Other Words: A Coursebook on Translation, Londres/New York, Routledge, 1992. CURRY H. B. , FEYS R. , (1958), Combinatory Logic, vol. 1, North Holland. A. P. Genova, May 2015 56

References DESCLES J. -P (2003), «La grammaire Applicative et Cognitive construit-elle des représentations universelles

References DESCLES J. -P (2003), «La grammaire Applicative et Cognitive construit-elle des représentations universelles ? » , http: //linx. revues. org/226 DESCLES, J-P. (1998), « Les Langues sont-elles des représentations du monde » , Essais sur le langage, logique, et sens comun, Editions universitaires, Fribourg. ENGLAND R. , HANSON S. , (2008), « Technical Translation and a Role for FCA » , International Conference on Advanced Language Processing and Web Information Technology, IEEE, 2008, pp 99 -103. A. P. Genova, May 2015 57

References FODOR, J. A. (1975), The Language of Tought, Harvard University Press, Cambridge Mass.

References FODOR, J. A. (1975), The Language of Tought, Harvard University Press, Cambridge Mass. GANTER B. , STUMME G. , WILLE R. , (2005), Formal. Concept Analysis, Foundations and Applications, Springer, 2005. PASCU A. , DESCLES J. -P (2005), « Modélisation sémantique et logique de la catégorisation » , LALICC, Paris-Sorbonne, http: //lalic. paris-sorbonne. fr/AXESRECHERCHE/operation 5. html SHAUMYAN, S. (1977), Applicational Grammar as a Semantic Theory of Natural Language, Chicago University Press. WHORF, B. L. (1966), Linguistique et anthropologie, Payot, Paris (Language Thought and Reality, Wiley and Sons, New York, 1958). A. P. Genova, May 2015 58

References FCA page d'accueil – http : //www. fcahome. org. uk/fca. html 4. Concept

References FCA page d'accueil – http : //www. fcahome. org. uk/fca. html 4. Concept Explorer CONEXP http : //sourceforge. net/projects/conexp/ A. P. Genova, May 2015 59

Fred Sommers, The Logic of Natural Languages, Oxford University Press, 1984 « There is

Fred Sommers, The Logic of Natural Languages, Oxford University Press, 1984 « There is as much truth in beauty as is beauty in truth. » A. P. Genova, May 2015 60

THANK YOU ! A. P. Genova, May 2015 61

THANK YOU ! A. P. Genova, May 2015 61