ALEXANDRU IOAN CUZA UNIVERSITATY OF IAI FACULTY OF

  • Slides: 20
Download presentation
“ALEXANDRU IOAN CUZA” UNIVERSITATY OF IAŞI FACULTY OF COMPUTER SCIENCE The Semantics and Pragmatics

“ALEXANDRU IOAN CUZA” UNIVERSITATY OF IAŞI FACULTY OF COMPUTER SCIENCE The Semantics and Pragmatics of Natural Language Daniela GÎFU http: //profs. info. uaic. ro/~daniela. gifu/

Course 1 The General Presentation 2

Course 1 The General Presentation 2

Main Concepts 1. Natural Language - used by human beings for communication. . .

Main Concepts 1. Natural Language - used by human beings for communication. . . - sign, system, symbols, ruleset (or grammar) 2. Semantics - word meaning, causes of words change. . . 3. Pragmatics - how language is used by a emitent in a given context, with the intention to act in a determined mode and with certain effects on the interlocutor. . . 3

Natural Language Processing – a subdomain of Artificial Intelligence and Linguistics 1. Thematic Areas

Natural Language Processing – a subdomain of Artificial Intelligence and Linguistics 1. Thematic Areas - Linguistics - mathematical linguistics - computational linguistics - Formal Language - Linguistic and Language Processing - The grammatical structure of utterances: the sentence, constituents, phrase, classifications and structural rules, syntactic processing. . . - Parser - Semantics & Pragmatics 4

Mathematical linguistics - the study of mathematical structures and methods that are of importance

Mathematical linguistics - the study of mathematical structures and methods that are of importance to linguistics → Phonetics, → Phonology, → Morphology, → Syntax, and → Semantics, → and… Sociolinguistics → Language Acquisition. Computational linguistics - the scientific and engineering discipline concerned with understanding written and spoken language from a computational perspective. - detecting synonymy (Grigonytė et al. , 2010); - developing Word. Net (Gala et Mititelu, 2013), (Iftene and Balahur, 2007). . . ; -WSD (Yang, H. et al. 2010), (Lefever et Hoste, 2010), (Tufiș, 2002). . . ; - semantic annotation (Garcia et al. , 2012). . . ; - reconstructing a diachronic morphology (Cristea et al. , 2007/2012) - diachronic text classification (Mihalcea and Năstase, 2012; Popescu and Strapparava, 2015), etc. 5

Formal language 1. Symbol - a character, an abstract entity that has no meaning

Formal language 1. Symbol - a character, an abstract entity that has no meaning by itself Ex: lettters, digits and special characters 2. Alphabet - finite set of symbols - often denoted by Σ Ex: B = {0, 1} says B is an alphabet of two symbols, 0 and 1 C = {a, b, c} – C an alphabet of 3 symbols, a, b and c 6

Formal language 3. String or a word - a finite sequence of symbols from

Formal language 3. String or a word - a finite sequence of symbols from an alphabet Ex: 01110 and 111 are strings from the alphabet B above aaabccc and b are strings from the C above 4. Language - a set of strings from an alphabet 5. Formal language (or simply language) - a set L of strings over some finite alphabet Σ - described using formal grammars 7

Linguistic and Language Processing 1. Linguistics - Science of language. Includes: 1. Sounds (phonology)

Linguistic and Language Processing 1. Linguistics - Science of language. Includes: 1. Sounds (phonology) 2. Word formation (morphology) 3. Sentence structure (syntax) 4. Meaning (semantics) and understanding (pragmatics)… 2. Levels of linguistic analysis - Higher level → Speech Recognition (SR) - Lower levels → Natural Language Processing (NLP) 8

Levels of Linguistic Analysis Acoustic signal Phones SR Phonetics – production and perception of

Levels of Linguistic Analysis Acoustic signal Phones SR Phonetics – production and perception of speech Phonology – Sound patterns of language Letters - strings Lexicon – Dictionary of words in a language Morphemes Morphology – Word formation and structure Words NLP Syntax – Sentence structure Phrases & sentences Semantics – Intended meaning Meaning out of context Pragmatics – Understanding from external info Meaning in context 9

Steps of NLP 1. Morphological and Lexical Analysis - Lexicon - Morphology – identification,

Steps of NLP 1. Morphological and Lexical Analysis - Lexicon - Morphology – identification, analysis and description of structure of words - Words – the smallest units of syntax - Syntax – the rules / principles that govern the sentence structure of any language - Lexical analysis – dividing text into paragraphs, sentences and words 2. Syntactic analysis - Analysis of words in a sentence, knowing the grammatical structure of the sentence Ex: Boy the go the store – correct? 10

Steps of NLP 3. Semantic Analysis - Derives an absolute (dictionary definition) meaning from

Steps of NLP 3. Semantic Analysis - Derives an absolute (dictionary definition) meaning from the context - The structure created by the syntactic analyzer are assigned meaning. A mapping is made between the syntactic structure and objects in the task domain. Ex: “Colourless green ideas…” – correct? 4. Discourse Integration - The meaning of an individual sentence may depend on the sentences that precede it and may influence the meaning of the sentences that follow it. Ex: the word “it” in the sentence, “you wanted it” depends on the prior discourse context. 11

Steps of NLP 5. Pragmatic analysis - Derives knowledge from the external commonsense information

Steps of NLP 5. Pragmatic analysis - Derives knowledge from the external commonsense information - Means understanding the purposeful use of language in situations particularly those aspects pf language which require world knowledge - What was said is reinterpreted to determine what was actually meant. Ex: “Do you know what time it is” – should be interpreted as a request. 12

Semantics and pragmatics (S & P) 1. S & P - 2 stages of

Semantics and pragmatics (S & P) 1. S & P - 2 stages of analysis concerned with getting at the meaning of a sentence; - 1 st – S – a partial representation of the meaning based on the possible syntactic structure(s) of the sentence and the meanings of the words in that sentence; - 2 nd – P – the meaning based on the contextual and the world knowledge. 13

Semantics and pragmatics (S & P) 14

Semantics and pragmatics (S & P) 14

Semantics and pragmatics (S & P) 1. Ex. for differences: “He asked for the

Semantics and pragmatics (S & P) 1. Ex. for differences: “He asked for the boss”. We can work out that: 1. Someone (who is male) asked for someone who is a boss. 2. We can’t say who these people are and why the first guy wanted the second. 3. If we know something about the context (including the last few sentences spoken/written) we may be able to work these things out. 4. Maybe the last sentence was: “Fred had just been sacked”. 5. From our general knowledge that bosses generally sack people: if people want to speak to people who sack them it is generally to complain about it. 6. We could then really start to get at the meaning of the sentence: “Fred wants to complain to his boss about getting sacked”. 15

Homework: 1. Each student has to present a paper about clustering texts that guide

Homework: 1. Each student has to present a paper about clustering texts that guide final project (https: //aclweb. org/anthology/) între 2010 -2016 Platformele: LREC (Language Resources and Evaluation Conference) ACL (Association of Computational Linguistics) EACL (European Association of Computational Linguistics) Coling (International Conference on Computational Linguistics) 16

Other references… • Hamid Palangi, Li Deng, Yelong Shen, Jianfeng Gao, Xiaodong He, Jianshu

Other references… • Hamid Palangi, Li Deng, Yelong Shen, Jianfeng Gao, Xiaodong He, Jianshu Chen, Xinying Song, Rabab Ward (2015) Deep Sentence Embedding Using Long Short-Term Memory Networks: Analysis and Application to Information Retrieval. In https: //arxiv. org/pdf/1502. 06922. pdf • Kate Cohen, Fredrik Johansson, Lisa Kaati, and Jonas Mork, (2014) Detecting Linguistic Markers for Radical Violence in Social Media, Terrorism and Political Violence 26, no. 1 : 246 -256. • Joel Brynielsson, Andreas Horndahl, Fredrik Johansson, Lisa Kaati, Christian Martenson, and Pontus Svenson. (2013). Harvesting and Analysis of Weak Signals for Detecting Lone-Wolf Terrorists. Security Informatics 2, no. 11 (2013), accessed May 15, 2016, http: //www. securityinformatics. com/content/2/1/11; • Alexander V. Mamishev and Murray Sargent. (2013). Creating Research and Scientific Documents Using Microsoft Word. Microsoft Press, Redmond, WA. • Sean M. Gerrish and David M. Blei. (2010). A language-based approach to measuring scholarly impact. In Proceedings of International Conference of Machine Learning. 17

 • Alexander V. Mamishev and Sean D. Williams. 2010. Technical Writing for Teams:

• Alexander V. Mamishev and Sean D. Williams. 2010. Technical Writing for Teams: The STREAM Tools Handbook. Wiley-IEEE Press, Hoboken, NJ. Jonas Muller, Aditya Thyagarajan (2016). Siamese Recurrent Architectures for Learning Sentence Similarity. In Proceedings of AAAI-16 • Tim Rocktäschel, Edward Grefenstette, Karl Moritz Hermann, Tomáš Kočiský, Phil Blunsom (2015) Reasoning about entailment with Neural Attention. IN Proceedings of ICLR, http: //arxiv. org/abs/1509. 06664 • Xiaofeng Wang, Matthew S. Gerber, and Donald E. Brown. 2012. Automatic Crime Prediction using Events Extracted from Twitter Posts. SBP, LNCS 7227: 231 -238. • Yaser Abu-Mostafa, Malik Magdon-Ismail, Hsuan-Tien Lin. (2012). Learning From Data, amlbook. com. • Jiaming Xu, Peng Wang, Guanhua Tian, Bo Xu, Jun Zhao, Fangyuan Wang, Hongwei Hao (2015) Short Text Clustering via Convolutional Neural Networks. In Proceedings of NAACL-HLT 2015, 62– 69 • Trevor Hastie, Robert Tibshirani, Jerome Friedman. (2008). The Elements of Statistical Learning. Data Mining, Inference, and Prediction, 2 nd ed. , Springer. 18

Final project: Implementing a tool for text clustering, including diachronic perspective (demo & Web

Final project: Implementing a tool for text clustering, including diachronic perspective (demo & Web resource) - SEMANTRIA model 1. Mixed teams (linguists + informaticians) - Building corpus: http: //www. bbc. com/news – English http: //www. e-ziare. ro/ – Romanian - NER - Topic - Detection – LDA -Domain & subdomains detection - Sentiment Analysis - Automatic News Source Detection - Interface Construction 19

Thank you! 20

Thank you! 20