ALEXANDRU IOAN CUZA UNIVERSITATY OF IAI FACULTY OF

  • Slides: 65
Download presentation
“ALEXANDRU IOAN CUZA” UNIVERSITATY OF IAŞI FACULTY OF COMPUTER SCIENCE The Semantics and Pragmatics

“ALEXANDRU IOAN CUZA” UNIVERSITATY OF IAŞI FACULTY OF COMPUTER SCIENCE The Semantics and Pragmatics of Natural Language Daniela GÎFU http: //profs. info. uaic. ro/~daniela. gifu/

Course 1 & 2 SPNL OVERVIEW 2

Course 1 & 2 SPNL OVERVIEW 2

W h o a m I ? https: //profs. info. uaic. ro/~daniela. gifu/

W h o a m I ? https: //profs. info. uaic. ro/~daniela. gifu/

“Alexandru Ioan Cuza” University of Iași T H E H A L L O

“Alexandru Ioan Cuza” University of Iași T H E H A L L O F T H E L O S T E P S

Faculty of Computer Science BE AMONG THE FIRST…. .

Faculty of Computer Science BE AMONG THE FIRST…. .

 What is this course about? Ø Meaning and Natural Language Processing (NLP) Ø

What is this course about? Ø Meaning and Natural Language Processing (NLP) Ø Computational Semantics Ø Computational Pragmatics 6

Familiarization with relevant Terminology • Semantics • Pragmatics • Natural language • Computational Linguistics

Familiarization with relevant Terminology • Semantics • Pragmatics • Natural language • Computational Linguistics • Natural Language Processing … 7

 Language Sapir-Wharf Hypothesis 8

Language Sapir-Wharf Hypothesis 8

Simulation of human (natural) intelligence by machines Interdisplinary field ~ Scientific study of language

Simulation of human (natural) intelligence by machines Interdisplinary field ~ Scientific study of language from a computational perspective A discipline that spans theory and practice to understand computer systems and networks at a deep level. 9

 Computational Linguistics (CL) vs. Natural Language Processing (CLP) 10

Computational Linguistics (CL) vs. Natural Language Processing (CLP) 10

The research domain CL = gives theoretical background (computational theories on language), linguistics models.

The research domain CL = gives theoretical background (computational theories on language), linguistics models. NLP = applied CL, including: - natural language technology (NLT) - human language technology (HLT) 11

Natural language technology Spoken language - speech processing (from speech to text to syntax

Natural language technology Spoken language - speech processing (from speech to text to syntax and semantics to speech) Written language – my area of interest Language in (multimodality) - speech - intonation - image correlation with other modalities 12

Written language technologies Document segmentation and interpretation – cleaning (elimination of dots, enhancing contrast,

Written language technologies Document segmentation and interpretation – cleaning (elimination of dots, enhancing contrast, etc. ) – separation of text from image, curved lines. . . – recognizing printed, semi-uncial characters, etc. • Optical Character Recognition (OCR) ~ 100% accuracy in scanning printed Latin script based material Challenge in OCR 13

OCR Handwriting – Why? = presents some unique particularities = many varieties of cursive

OCR Handwriting – Why? = presents some unique particularities = many varieties of cursive writing 14 see: http: //www. cvisiontech. com/library/ocr/accurate-ocr/ocr-handwriting-sp-914996830. html

OCR Handwriting very challenging = the interpretation of physician handwriting (Rasmussen, L. V. et

OCR Handwriting very challenging = the interpretation of physician handwriting (Rasmussen, L. V. et al. , 2012; Broda. B. & Piasecki, M. , 2007) = analysis of old handwritten documents (useful for linguists, musicians, historians, etc. ) Document Image Analysis 15

Written language technologies • Analysis and understanding of written language – sub-syntactic processing •

Written language technologies • Analysis and understanding of written language – sub-syntactic processing • lexical units • sentence splitting • clause borders • part of speech and morphological information • lemmas • entity names • groups (nominal, verbal, prepositional, etc. ) and lexical attractions (collocations) 16

Written language technologies • Language analysis and understanding – semantic and discourse processing •

Written language technologies • Language analysis and understanding – semantic and discourse processing • semantic disambiguation → word senses • semantic roles labeling • rhetorical structure of discourse and dialogue • anaphora resolution • text summarization 17

Mathematical Linguistics the study of mathematical structures and methods that are of importance to

Mathematical Linguistics the study of mathematical structures and methods that are of importance to linguistics. → Phonetics, → Phonology, → Morphology, → Syntax, and → Semantics, → and… Sociolinguistics → Language Acquisition. Mathematical Linguistics before Computational Linguistics…. ML ⇔ CL? 18 18

NLP = art of solving problems that need to analyze (or generate) natural language

NLP = art of solving problems that need to analyze (or generate) natural language text. Find that metrics for a good solution to the engineering problem… Google Translate – Don’t blame!!!! Romanian = Luceafărul de dimineață English = The morning gentleman (bad answer) = Morning star (good answer) Let’s try! Why? ? explains how human translators do their job. . . 19

NLP – a subdomain of Artificial Intelligence & Linguistics Thematic Areas - Linguistics -

NLP – a subdomain of Artificial Intelligence & Linguistics Thematic Areas - Linguistics - mathematical linguistics - computational linguistics - Formal Language - Linguistic and Language Processing - The grammatical structure of utterances: the sentence, constituents, phrase, classifications and structural rules, syntactic processing. . . - Parser - Semantics & Pragmatics 20

NLP = an area of Artificial Intelligence (AI) devoted to creating computers that use

NLP = an area of Artificial Intelligence (AI) devoted to creating computers that use NL as input and/or output. AI-hard problem = machine reading comprehension = produces language as output on the basis of data input 21

CL = developing computational methods/models of human linguistics behavior. § INFORMATION RETRIEVAL § INFORMATION

CL = developing computational methods/models of human linguistics behavior. § INFORMATION RETRIEVAL § INFORMATION EXTRACTION § MACHINE TRANSLATION § QUESTION – ANSWERING § SUMMARIZATION § MACHINE READABLE DICTIONARIES § SPELLING & GRAMMAR CHECKERS … 22

CL – Applications A discipline concerned with understanding written and spoken language from a

CL – Applications A discipline concerned with understanding written and spoken language from a computational perspective. - detecting synonymy (Grigonytė et al. , 2010); - developing Word. Net (including Romanian - Gala et Mititelu, 2013), (Iftene and Balahur, 2007). . . ; -WSD (Yang, H. et al. 2010), (Lefever et Hoste, 2010), (Tufiș, 2002). . . ; - semantic annotation (Garcia et al. , 2012). . . ; - reconstructing a diachronic morphology (Cristea et al. , 2007/2012) - diachronic text classification (Mihalcea and Năstase, 2012; Popescu and Strapparava, 2015), etc. - epoch detection (Gifu, 2015/2016/2017). . . ; Tools developed by students… 23 23

Linguistic & Language Processing 1. Linguistics - Science of language. Includes: ü Sounds (phonology)

Linguistic & Language Processing 1. Linguistics - Science of language. Includes: ü Sounds (phonology) ü Word formation (morphology) ü Sentence structure (syntax) ü Meaning (semantics) and understanding (pragmatics)… 2. Levels of linguistic analysis - Higher level → Speech Recognition (SR) - Lower levels → Natural Language Processing (NLP) 24

Levels of Linguistic Analysis Speech Recognition Acoustic signal Phonemes Phonetics – production and perception

Levels of Linguistic Analysis Speech Recognition Acoustic signal Phonemes Phonetics – production and perception of speech Phonology – Sound patterns of language Letters - strings Lexicon – Dictionary of words in a language Morphemes Morphology – Word formation and structure Words NLP Syntax – Sentence structure Phrases & sentences Semantics – Intended meaning Meaning out of context Pragmatics – Understanding from external info Meaning in context 25

 NLP Pipeline Course purpose 26

NLP Pipeline Course purpose 26

MAIN CONCEPTS 1. Natural Language - used by human beings for communication. . .

MAIN CONCEPTS 1. Natural Language - used by human beings for communication. . . - sign, system, symbols, rule-set (or grammar) 2. Semantics - literal meaning determined from a word, phrase, sentence. 3. Pragmatics - contextual meaning {situation, speaker, etc. } 27

Natural or ordinary language • A system of speech symbols → (form criterion) Types:

Natural or ordinary language • A system of speech symbols → (form criterion) Types: a) speech (spoken language) - produced by articulate sounds. b) signing (written language) - the representation of a spoken or gestural language. • The most important means of human communication → (function criterion) 28

Natural Language… • Multiplicity of languages 29

Natural Language… • Multiplicity of languages 29

Formal language I* 1. Symbol - a character, an abstract entity that has no

Formal language I* 1. Symbol - a character, an abstract entity that has no meaning by itself Ex: lettters, digits and special characters 2. Alphabet - finite set of symbols - often denoted by Σ Ex: B = {0, 1} says B is an alphabet of two symbols, 0 and 1 C = {a, b, c} – C an alphabet of 3 symbols, a, b and c * More about formal language: http: //www. its. caltech. edu/~matilde/Formal. Language. Theory. pdf 30

Formal language II 3. String or word - a finite sequence of symbols from

Formal language II 3. String or word - a finite sequence of symbols from an alphabet Ex: 01110 and 111 are strings from the alphabet B above aaabccc and b are strings from the C above 4. Sentence - a string of words. Ex: I saw the gentleman with the hat. String = a b c d e b f 31

Formal language III Define possible relations of parts of a string to each other?

Formal language III Define possible relations of parts of a string to each other? A. [I] saw the gentleman [with the binocular] = [a] b c d [e b f] B. I saw [the gentleman with the binocular] = a b [c d e b f ] We can represent structures with trees… Ex: I saw the gentleman with the binocular. I saw the gentleman with the binocular. 32

Formal language IV 5. Language - a set of strings of symbols from an

Formal language IV 5. Language - a set of strings of symbols from an alphabet. 6. Natural Language or ordinary language - open-ended = built on three different knowledge components: the sound of words - phonology; the meaning of words - semantics; the grammatical rules according to which words are put together - syntax. 7. Formal language - a set L of sequences/strings over some finite alphabet Σ - described using formal grammars (a set of rules for strings, specified to it). - many application (e. g. Prognosis wearable system) 33

Formal language V Context-Free Grammars (CFG) - a finite set of grammar rules https:

Formal language V Context-Free Grammars (CFG) - a finite set of grammar rules https: //www. tutorialspoint. com/automata_theory/context_free_grammar_introduction. htm = a quadruple (N, T, P, S) , where: N = a finite set of non-terminal symbols (character or variable). Note! Each n ∈ N = type of phrase/clause in the sentence. T = a finite set of terminals (an alphabet, defined by the grammar) disjoint of N: N ∩ T = NULL. P = a finite set of (rewrite) rules or productions of the grammar, from N to P: N → (N ∪ T)* Note! The left-hand side of the production rule P does have any right context or left context. * = Kleene star operation = unary operation on sets of strings or sets of symbols or characters → a set N is written as N* (used for regular expressions). Ex: {"a", "b", "c"}* = {ε, "a", "b", "c", "aa", "ab", "ac", "ba", "bb", "bc", "ca", "cb", "cc", "aaa", "aab", . . . } - {ε} (the language consisting only of the empty string) S = start symbol/start symbol, used to represent the whole sentence. 34

Do you know other Grammars? 35

Do you know other Grammars? 35

Variations of Chomsky’s hierarchy, 1956 https: //commons. wikimedia. org/wiki/File: Chomsky-hierarchy. svg 36

Variations of Chomsky’s hierarchy, 1956 https: //commons. wikimedia. org/wiki/File: Chomsky-hierarchy. svg 36

Traffic Light – Visual Syntax, Semantics and Pragmatics S E M I O T

Traffic Light – Visual Syntax, Semantics and Pragmatics S E M I O T I C S see Woo, C. W. H. (2010). Visual Syntax, Semantics and Pragmatics: Structure, Meaning & Context. (PPT). Retrieved from Universities Brunei Darussalam. 37

Main Concepts - Examples 1. Syntax = the proper ordering of words - Grammars,

Main Concepts - Examples 1. Syntax = the proper ordering of words - Grammars, parse trees, etc. 2. Semantics - Semantic classes, ontologies, formal semantics, etc. 3. Pragmatics - Pronouns, reference resolution, discourse models, etc. 38

Computational Semantics NLP vs. CL What can semantics do for NLP? What can computation

Computational Semantics NLP vs. CL What can semantics do for NLP? What can computation do for theoretical models of NL semantics? 39

Computational Semantics Automating Language Comprehension 1. Automate the process of associating NL expression with

Computational Semantics Automating Language Comprehension 1. Automate the process of associating NL expression with semantic representations (known as logical forms) 2. Automate the process of interpreting those SRs and drawing inferences from them. 40

Computational Semantics Challenges 1. Unlimited number of NL expressions! * The semantic representation of

Computational Semantics Challenges 1. Unlimited number of NL expressions! * The semantic representation of each phrase = a function of the SRs of its syntactic parts. 2. Tension between expressibility, inferential power & complexity. * No perfect solution (see Tarski). People always tailor logic to the application. Note: Focus on FOL (first-order logic) = formulas of predicate (https: //www. cl. cam. ac. uk/teaching/1011/L 107/semantics. pdf 41

Main Concepts - II Big challenge – Ambiguity! A semantic scope ambiguity…. Every woman

Main Concepts - II Big challenge – Ambiguity! A semantic scope ambiguity…. Every woman loves a man. ∀x(woman(x)→ ∃y(man(y)∧loves(x, y))) ∃y(man(y)∧ ∀x(woman(x)→loves(x, y))) … and its interaction with anaphora = NP (pronoun, definite NP, proper name) Every student worked on a project. It was about computational semantics. Every politician made a speech. It was about terrorism. Students? 42

Main Concepts - II Other challenges – Combinatorics Constructing Lexical Functions (LF) directly from

Main Concepts - II Other challenges – Combinatorics Constructing Lexical Functions (LF) directly from the NL’s syntax means that the quantifier scope ambiguity must correspond to a syntactic ambiguity. Every woman loves a man. → 2 unintuitive parses • 6 quantifiers • Unsophisticated interaction with pragmatics Ø Generate all possible LFs Ø Filter out inadmissible ones 43

Main Concepts - II An alternative – Underspecified Semantics Use syntax to accumulate a

Main Concepts - II An alternative – Underspecified Semantics Use syntax to accumulate a set of constraints on the structure of the logical form. Ø A partial description of trees such as these… ∃ ∀ x woman x y ∃ y y man love y x man y ∀ x woman x love x y 44

Main Concepts - II The Underspecified Logical Form This description is satisfied by 2

Main Concepts - II The Underspecified Logical Form This description is satisfied by 2 trees: 1. l 4 = l 2 and l 5 = l 3 2. l 4 = l 3 and l 5 = l 1 l 2 : ∃ l 2 l 1 : ∀ l 1 l 5 l 4 x woman x y ∃ y man y l 3 : love l 3 x man ∀ y x woman x y 45

Main Concepts - II More Challenges: Semantic Dependencies between an NL Phrase and its

Main Concepts - II More Challenges: Semantic Dependencies between an NL Phrase and its Context Pronouns Robert owns a house. It is orange. wrong: ∃x(house(x) ∧ own(r, x)) ∧ orange(y) complex construction: ∃x(house(x) ∧ own(r, x) ∧ orange(x)) Time Eva entered the room. She lit a lantern. It was red dark. Presuppositions David's son is bald. If baldness is hereditary, then David's son is bald. If David has a son, then David's son is bald. 46

Main Concepts - II Dynamic Semantics e. g. Discourse Representation Theory (DRT) ~ “the

Main Concepts - II Dynamic Semantics e. g. Discourse Representation Theory (DRT) ~ “the discourse context” § The meaning of an expression/sentence depends on its context. § An expression changes that input context into a different output one: Ø Existentials change the context by adding new entities to it for interpreting subsequent expressions. The result of interpretation is a new context. 47

Main Concepts - II DRT* – A successfully theory Pronouns (anaphora) A man walks.

Main Concepts - II DRT* – A successfully theory Pronouns (anaphora) A man walks. He talks. Few farmers own a donkey. It's fed twice a day. Tense ~ grammatical form Clarke stood up. John greeted him. Max entered the room. It was pitch dark. Presuppositions (if… than…) If baldness is hereditary, then David's son is bald. If David has a son, then David's son is bald. Propositional attitudes (belief, desire, imagination = mental states) Robert beliefs that Dana likes him. 48 * More about DRT: https: //plato. stanford. edu/entries/discourse-representation-theory/#Rep. Att. Com

Main Concepts - II Problems? ? . . . Need Pragmatics! Counter examples I.

Main Concepts - II Problems? ? . . . Need Pragmatics! Counter examples I. John can open Obama's safe. He knows the combination. II. David fell. Max pushed him. If Max scuba dives, he'll bring his son. vs. If Max scuba dives, he'll bring his regulator. Note: Need to resolve semantic underspecification to pragmatically preferred values. 49

Main Concepts - II Computational Pragmatics The semantics / pragmatics interface Pragmatics = the

Main Concepts - II Computational Pragmatics The semantics / pragmatics interface Pragmatics = the study of what people meant, but didn’t explicitly say. • Linguistic form underdetermines content; Pragmatics: commonsense reasoning about the context provides more specific content: ØLexical content ØWorld knowledge Ø Conventions of language use Ø Beliefs and intentions of dialogue participants • The process of constructing the intended LF involves defaults. Interaction between context and interpretation must be automated. 50

Syntax - Examples Words: hit, ball, the, John The ball hit John hit the

Syntax - Examples Words: hit, ball, the, John The ball hit John hit the ball. (1) – Syntax Parse Tree But, what about this sentence? Colour green ideas sleep furiously. Note! Syntax mapped into semantics 51

Syntactic frame & Semantic (predicate arguments) break(AGENT, ISNTRUMENT, PATIENT) Emanuel broke the window with

Syntactic frame & Semantic (predicate arguments) break(AGENT, ISNTRUMENT, PATIENT) Emanuel broke the window with a ball. (SUBJ) + (VERB) + (OBJ) + (MODIFIER) AGENT PATIENT INSTRUMENT The ball broke the window. (SUBJ) + (VERB) + (OBJ) INSTRUMENT PATIENT Same event (Filmore 68 – The case for case) The window broke. (SUBJ) + (VERB) PATIENT Frame. Net (Berkeley) https: //framenet. icsi. berkeley. edu/fndrupal/ Chinese Frame. Net http: //115. 24. 12. 8: 8080/cfn 52

Pragmatics - Example ADVANTAGES Have you got any cash on you? Deep meaning Can

Pragmatics - Example ADVANTAGES Have you got any cash on you? Deep meaning Can you lend me some money? - peoples’ intended meaning - their suppositions - their goals - any kind of actions 53

Pragmatics - Example You have a green light. DISADVANTAGE S - difficult to reach

Pragmatics - Example You have a green light. DISADVANTAGE S - difficult to reach to true meaning. - level of understanding varies. - every person has own approach of interpretation No context. No identity of the speaker. No speaker's intent. 54

Main Concepts - II Pragmatics in correlation with Ambiguity = the study of what

Main Concepts - II Pragmatics in correlation with Ambiguity = the study of what people meant, but didn’t explicitly say. . . You have a green light. ? - the space that belongs to you has green ambient lighting? ! - you are driving through a green traffic signal? ! - you no longer have to wait to continue driving? ! - you are permitted to proceed in a non-driving context? ! - your body is cast in a greenish glow? ! - you possess a light bulb that is tinted green? ! Challenge!!! Interaction between context and interpretation must be automated. 55

Examples – for discussion with the students… He asked for the boss. Fred had

Examples – for discussion with the students… He asked for the boss. Fred had just been sacked. 56

Pragmatics He asked for the boss. 1. Someone (who is male) asked for someone

Pragmatics He asked for the boss. 1. Someone (who is male) asked for someone who is a boss. 2. We can’t say who these people are and why the first guy wanted the second. 3. If we know something about the context (including the last few sentences spoken/written) we may be able to work these things out. Fred had just been sacked. 1. From our general knowledge that bosses generally sack people: if people want to speak to people who sack them it is generally to complain about it. 2. We could then really start to get at the meaning of the sentence: Fred wants to complain to his boss about getting sacked. 57

Semantics & pragmatics 2 stages of analysis concerned with getting at the meaning of

Semantics & pragmatics 2 stages of analysis concerned with getting at the meaning of a sentence: - 1 st – Semantics – a partial representation of the meaning based on the possible syntactic structure(s) of the sentence and the meanings of the words in that sentence. - 2 nd – Pragmatics – the meaning based on the contextual and the world knowledge. 58

Main Concepts - II CONCLUSIONS Computational semantics and pragmatics: Ø automatic construction of semantic

Main Concepts - II CONCLUSIONS Computational semantics and pragmatics: Ø automatic construction of semantic representations for NL expressions (in context). Ø automatic inferences over the representations. Major Issues: Ø Ambiguity of various levels: lexical, syntactic, semantic, pragmatic Ø Interface between LF from linguistic form and context of use (essential for modelling anaphora). Tools used include: Ø Information: syntax, world knowledge, lexical semantics, corpora… Ø Inference: logic (model checkers and theorem proving), machine 59 learning, statistics…

Semester Homework: 1. Each student has to present a paper about his/her SEMEVAL task

Semester Homework: 1. Each student has to present a paper about his/her SEMEVAL task that guide final project - https: //aclweb. org/anthology/ between 2016 -2019 EMNLP (Empirical Methods on Natural Language Processing) ACL (Association of Computational Linguistics) EACL (European Association of Computational Linguistics) COLING (International Conference on Computational Linguistics) … 60

Final project: SEMEVAL 2020 Groups structured by 2 -4 students: - 1 -2 humanists

Final project: SEMEVAL 2020 Groups structured by 2 -4 students: - 1 -2 humanists & 1 -2 computer scientists prepare a paper at the SEMEVAL-2020 based to their research supervised constantly - http: //alt. qcri. org/semeval 2020/index. php? id=tasks 61

Projects steps – next time 1. 2. 3. 4. 5. Form a team. .

Projects steps – next time 1. 2. 3. 4. 5. Form a team. . . Choose a task Define the teamwork Establish the modular structure Edit the paper – a possible structure 62

2. Choose a task 63

2. Choose a task 63

5. Edit the paper – making and outline * Choosing a Title * Abstract

5. Edit the paper – making and outline * Choosing a Title * Abstract (executive summary) & Keywords * Introduction (the new approach; background information; research problem/question; theoretical framework) * SOTA (citation tracking; content alert services; evaluating sources; primary sources; secondary sources…) * Methodology (qualitative methods; quantitative methods) * Results * Discussion * Conclusions and future work * References 64

Thank you… 65

Thank you… 65