Computational Linguistics INTroduction Lecture 2 Computers and Language

  • Slides: 23
Download presentation
Computational Linguistics INTroduction Lecture 2 Computers and Language

Computational Linguistics INTroduction Lecture 2 Computers and Language

CL: Two Main Disciplines Feb 2010 -- MR language and computers LINGUISTICS CLINT -

CL: Two Main Disciplines Feb 2010 -- MR language and computers LINGUISTICS CLINT - Lecture 1 COMP SCI 2

Language and Computers includes … l Natural Language Processing (NLP) l l l Human

Language and Computers includes … l Natural Language Processing (NLP) l l l Human Language Technology l l Computational models of language analysis, interpretation, and generation. syntax/semantics interface emphasis on large-scale performance example 1: Google search example 2: speech technology Computational Linguistics l l Emphasis on mechanised linguistic theories. Grew out of early Machine Translation efforts Feb 2010 -- MR CLINT - Lecture 1 3

Linguistics l l l Phonetics: The study of speech sounds Phonology: The study of

Linguistics l l l Phonetics: The study of speech sounds Phonology: The study of sound systems Morphology: The study of word structure Syntax: The study of sentence structure Semantics: The study of meaning Pragmatics: The study of language use Feb 2010 -- MR CLINT - Lecture 1 4

Noam Chomsky l l l Noam Chomsky’s work in the 1950 s radically changed

Noam Chomsky l l l Noam Chomsky’s work in the 1950 s radically changed linguistics, making syntax central. Chomsky has been the dominant figure in linguistics ever since. Chomsky invented the generative approach to grammar. Feb 2010 -- MR CLINT - Lecture 1 5

Formal v. Natural Languages Formal Languages Natural Languages l Arithmetic 3290 1 1010101 l

Formal v. Natural Languages Formal Languages Natural Languages l Arithmetic 3290 1 1010101 l English John saw the dog l Logic x man(x) mortal(x) l German Johann hat den hund gesehen l URL http: //www. cs. um. edu. mt l Maltese Ġianni ra kelb Feb 2010 -- MR CLINT - Lecture 1 6

Ambiguity l Morphological Ambiguity l Lexical Ambiguity l Syntactic Ambiguity l Semantic Ambiguity l

Ambiguity l Morphological Ambiguity l Lexical Ambiguity l Syntactic Ambiguity l Semantic Ambiguity l Pragmatic Ambiguity l The management of ambiguity is central to the success of CL Feb 2010 -- MR CLINT - Lecture 1 7

Ambiguity l Find at least 5 meanings of this sentence: l 9/11/2021 I made

Ambiguity l Find at least 5 meanings of this sentence: l 9/11/2021 I made her duck Speech and Language Processing - Jurafsky and Martin 8

I made her duck l l l I cooked a duck for her I

I made her duck l l l I cooked a duck for her I cooked a duck belonging to her I created a duck for her I created a duck that now belongs to her I caused her to lower head I turned her into a duck Feb 2010 -- MR CLINT - Lecture 1 9

Ambiguity l l l I cooked waterfowl for her benefit (to eat) I cooked

Ambiguity l l l I cooked waterfowl for her benefit (to eat) I cooked waterfowl belonging to her I created the (ceramic? ) duck she owns I caused her to quickly lower her upper body I waved my magic wand turned her into undifferentiated waterfowl 9/11/2021 Speech and Language Processing - Jurafsky and Martin 10

Sources of Ambiguity l I caused her to quickly lower head or body. l

Sources of Ambiguity l I caused her to quickly lower head or body. l l I cooked waterfowl belonging to her. l l Lexical category (part of speech): “duck” can be a noun or verb; a verb in this case Lexical category: “her” can be a possessive (“of her”) or dative (“for her”) pronoun I made the (ceramic) duck statue she owns l 9/11/2021 Lexical Semantics: “make” can mean “create” or “cook”, and about 100 other things as well Speech and Language Processing - Jurafsky and Martin 11

Ambiguity l l Ambiguity is a fundamental problem of computational linguistics Resolving ambiguity is

Ambiguity l l Ambiguity is a fundamental problem of computational linguistics Resolving ambiguity is a crucial goal 9/11/2021 Speech and Language Processing - Jurafsky and Martin 12

Computer Science l The study of basic concepts l l Information Data Algorithm Program

Computer Science l The study of basic concepts l l Information Data Algorithm Program Feb 2010 -- MR CLINT - Lecture 1 13

Information Data Algorithm Program l l Information is a theoretical concept invented by Shannon

Information Data Algorithm Program l l Information is a theoretical concept invented by Shannon in 1948 to measure uncertainty. The units of this measure are called bits. l Length – metres l Weight – kilos l Information – bits 1 bit is the amount of uncertainty inherent to a situation when there are exactly two possible outcomes. Example: for breakfast I will have coffee or I will have tea (nothing else). When I tell you that I have tea, I have conveyed one bit of information. The greater the number of possible outcomes, the more bits of infomation involved in the statement that indicates the actual outcome. Feb 2010 -- MR CLINT - Lecture 1 14

Information Data Algorithm Program l l A formalized representation of facts or concepts suitable

Information Data Algorithm Program l l A formalized representation of facts or concepts suitable for communication, interpretation, or processing by people or automated means. Example: a telephone directory Unlike information, which is abstract, data is concrete Data has a certain level of structure. In the telephone directory, for example, we have the structure of a list of entries, each of which has a name, an address, and a number. Feb 2010 -- MR CLINT - Lecture 1 15

Information Data Algorithm Program A completely defined procedure for the solution of a given

Information Data Algorithm Program A completely defined procedure for the solution of a given problem in a finite number of steps Feb 2010 -- MR CLINT - Lecture 1 16

Algorithm for Chocolate Cake Feb 2010 -- MR CLINT - Lecture 1 17

Algorithm for Chocolate Cake Feb 2010 -- MR CLINT - Lecture 1 17

Computer Program l l A set of instructions, written in a specific programming language,

Computer Program l l A set of instructions, written in a specific programming language, which a computer follows in processing data, performing an operation, or solving a logical problem. Concrete A program can implement an algorithm. More than one program may implement the same algorithm. Not all programs express good algorithms! Feb 2010 -- MR CLINT - Lecture 1 18

Algorithms and Linguistics l l Do linguistic theories in the abstract make sense? Linguistic

Algorithms and Linguistics l l Do linguistic theories in the abstract make sense? Linguistic theory explain linguistic knowledge in the form of l l l grammar rules theories about grammar rules But performance, involves processing issues: Feb 2010 -- MR CLINT - Lecture 1 19

Computational Linguistics – Issues l Can an artificial system learn a language with limited

Computational Linguistics – Issues l Can an artificial system learn a language with limited exposure to grammatical sentences? Feb 2010 -- MR CLINT - Lecture 1 20

Computers and Language Twin Goals l Scientific Goal: Contribute to Linguistics by adding a

Computers and Language Twin Goals l Scientific Goal: Contribute to Linguistics by adding a computational dimension. l Technological Goal: Develop machinery capable of handling human language that can support “language engineering” Feb 2010 -- MR CLINT - Lecture 1 21

Computers and Language: Applications l l l Information Retrieval/Extraction Document Classification Question Answering Style

Computers and Language: Applications l l l Information Retrieval/Extraction Document Classification Question Answering Style and Spell Checking Multimodal Interaction Machine Translation Feb 2010 -- MR CLINT - Lecture 1 22

Algorithms l l Many of the algorithms that we’ll study will turn out to

Algorithms l l Many of the algorithms that we’ll study will turn out to be transducers; algorithms that take one kind of structure as input and output another. Unfortunately, ambiguity makes this process difficult. This leads us to employ algorithms of various sorts that are designed to manage ambiguity 9/11/2021 Speech and Language Processing - Jurafsky and Martin 23