Computational Linguistics INTroduction Lecture 1 Computers and Language
- Slides: 30
Computational Linguistics INTroduction Lecture 1 Computers and Language Feb 2010 -- MR CLINT - Lecture 1
Course Information l l l Course Website http: //staff. um. edu. mt/mros 1/lin 2160 Lecturers mike. rosner@um. edu. mt ray. fabri@um. edu. mt Book Jurafsky & Martin, Speech and Language Processing, Prentice Hall 2009, ISBN 978 -0 -13 -504196 -3 Natural Language Toolkit (NLTK) http: //www. nltk. org/ Feb 2010 -- MR CLINT - Lecture 1 2
CL: Two Main Disciplines Feb 2010 -- MR language and computers LINGUISTICS CLINT - Lecture 1 COMP SCI 3
Language and Computers includes … l Natural Language Processing (NLP) l l l Human Language Technology l l Computational models of language analysis, interpretation, and generation. syntax/semantics interface emphasis on large-scale performance example 1: Google search example 2: speech technology Computational Linguistics l l Emphasis on mechanised linguistic theories. Grew out of early Machine Translation efforts Feb 2010 -- MR CLINT - Lecture 1 4
Linguistics l l l Phonetics: The study of speech sounds Phonology: The study of sound systems Morphology: The study of word structure Syntax: The study of sentence structure Semantics: The study of meaning Pragmatics: The study of language use Feb 2010 -- MR CLINT - Lecture 1 5
Noam Chomsky l l l Noam Chomsky’s work in the 1950 s radically changed linguistics, making syntax central. Chomsky has been the dominant figure in linguistics ever since. Chomsky invented the generative approach to grammar. Feb 2010 -- MR CLINT - Lecture 1 6
Generative Grammar: Some Key Points l l Theory of grammar includes mathematical definition of what a grammar is. A language is a (possibly infinite) set of sentences. But a grammar is finite. Grammar generates all and only sentences of a language. l l Undergeneration Overgeneration [source: Sag & Wasow] Feb 2010 -- MR CLINT - Lecture 1 7
Generative Power of a Grammar G L L undergeneration only but not all L G G overgeneration all but not only all and only Feb 2010 -- MR CLINT - Lecture 1 8
Formal Grammar l l Grammar is a set of rewrite rules Rules have the form LHS RHS l l l LHS can be rewritten as RHS LHS & RHS are sequences made of words or symbols Lexicon specifies words and their categories Category word l Category can be rewritten as word Feb 2010 -- MR CLINT - Lecture 1 9
A Simple Grammar/Lexicon grammar: S NP VP NP N VP V NP lexicon: V kicks N John N Bill Feb 2010 -- MR S NP N VP V NP N John CLINT - Lecture 1 kicks Bill 10
Formal v. Natural Languages Formal Languages Natural Languages l Arithmetic 3290 1 1010101 l English John saw the dog l Logic x man(x) mortal(x) l German Johann hat den hund gesehen l URL http: //www. cs. um. edu. mt l Maltese Ġianni ra kelb Feb 2010 -- MR CLINT - Lecture 1 11
Some Points of Similarity l l Sentences are sequences of words (or symbols). Rules determine which sequences are valid sentences. Sentences have a definite structure. Sentence structure systematically related to meaning. Feb 2010 -- MR CLINT - Lecture 1 12
Structure Affects Meaning I shot an elephant in my trousers Feb 2010 -- MR CLINT - Lecture 1 13
Points of Difference Formal Languages l The grammar defines the language l Restricted application l Non ambiguous Feb 2010 -- MR Natural Languages l The language defines the grammar l Universal application l Highly ambiguous CLINT - Lecture 1 14
Ambiguity l l l Morphological Ambiguity en-large-ment Lexical Ambiguity Iraqi Head Seeks Arms Syntactic Ambiguity small animals and children laugh Semantic Ambiguity every girl loves a sailor Pragmatic Ambiguity can you pass the salt? The management of ambiguity is central to the success of CL Feb 2010 -- MR CLINT - Lecture 1 15
I made her duck l l l I cooked a duck for her I cooked a duck belonging to her I created a duck for her I created a duck that now belongs to her I caused her to lower head I turned her into a duck Feb 2010 -- MR CLINT - Lecture 1 16
Computer Science l The study of basic concepts l l l Information Data Algorithm Program The application of these concepts to practical tasks. Implementation of computational models from other fields (meteorology, . . , linguistics) Feb 2010 -- MR CLINT - Lecture 1 17
Information Data Algorithm Program l l Information is a theoretical concept invented by Shannon in 1948 to measure uncertainty. The units of this measure are called bits. l Length – metres l Weight – kilos l Information – bits 1 bit is the amount of uncertainty inherent to a situation when there are exactly two possible outcomes. Example: for breakfast I will have coffee or I will have tea (nothing else). When I tell you that I have tea, I have conveyed one bit of information. The greater the number of possible outcomes, the more bits of infomation involved in the statement that indicates the actual outcome. Feb 2010 -- MR CLINT - Lecture 1 18
Information Data Algorithm Program l l A formalized representation of facts or concepts suitable for communication, interpretation, or processing by people or automated means. Example: a telephone directory Unlike information, which is abstract, data is concrete Data has a certain level of structure. In the telephone directory, for example, we have the structure of a list of entries, each of which has a name, an address, and a number. Feb 2010 -- MR CLINT - Lecture 1 19
Information Data Algorithm Program l l A completely defined procedure for the solution of a given problem in a finite number of steps Designed for a well-defined task. Finite description length. Guaranteed to terminate. Abstract Feb 2010 -- MR CLINT - Lecture 1 20
Algorithm for Chocolate Cake Feb 2010 -- MR CLINT - Lecture 1 21
Program to Add X and Y Read X and Y X = 2, Y = 3 subtract 1 from X add 1 to Y no Feb 2010 -- MR X = 0? CLINT - Lecture 1 yes Output Y 22
Computer Program l l A set of instructions, written in a specific programming language, which a computer follows in processing data, performing an operation, or solving a logical problem. Concrete A program can implement an algorithm. More than one program may implement the same algorithm. Not all programs express good algorithms! Feb 2010 -- MR CLINT - Lecture 1 23
Instructions vs. Execution Steps 1. 2. 3. 4. 5. Read X Read Y X = X-1 Y = Y+1 If X = 0 then Print(X) else goto 3 How many instructions? How many execution steps? Feb 2010 -- MR CLINT - Lecture 1 24
Algorithms and Linguistics l l Do linguistic theories in the abstract make sense? Linguistic theory explain linguistic knowledge in the form of l l l grammar rules theories about grammar rules But performance, involves processing issues: Feb 2010 -- MR CLINT - Lecture 1 25
Computational Linguistics – Issues l l l How are a grammar and a lexicon represented? How is the structure of a given sentence actually discovered? How can we actually generate a sentence to express a particular intended meaning? How can linguistic theory be made concrete enough to test algorithmically? Can an artificial system learn a language with limited exposure to grammatical sentences? Feb 2010 -- MR CLINT - Lecture 1 26
Computers and Language Twin Goals l Scientific Goal: Contribute to Linguistics by adding a computational dimension. l Technological Goal: Develop machinery capable of handling human language that can support “language engineering” Feb 2010 -- MR CLINT - Lecture 1 27
Computers and Language Tools & Resources l l l Grammar Formalisms, e. g. Definite Clause Grammars Parsing Algorithms sentence structure Generation Algorithms structure sentence Statistical Methods Linguistic Corpora Feb 2010 -- MR CLINT - Lecture 1 28
Computers and Language: Applications l l l Information Retrieval/Extraction Document Classification Question Answering Style and Spell Checking Multimodal Interaction Machine Translation Feb 2010 -- MR CLINT - Lecture 1 29
LECTURES Feb 2010 -- MR 1 Overview 2 Chomsky Hierarchy 3 Chomsky Hierarchy 4 Chomsky Hierarchy 5 Computational Syntax 6 Agreement & Subcategorisation 7 Computational Syntax 8 Computational Syntax 9 Corpora, Tools and Techniques 10 Morphology 11 Computational Morphology 12 Computational Morphology 13 Computational Morphology 14 Revision CLINT - Lecture 1 30
- Computational speed
- Chomsky computational linguistics
- Xkcd computational linguistics
- Computational linguistics olympiad
- Columbia computational linguistics
- Language
- History of applied linguistics
- 01:640:244 lecture notes - lecture 15: plat, idah, farad
- Computer programming chapter 1
- Computer programming chapter 1
- Chapter 1 introduction to computers and programming
- Chapter 1 introduction to computers and programming
- Langue and parole
- Language and the brain in linguistics
- What is a computer
- Mainframe computer definition and examples
- Assembly language for intel based computers
- Assembly language for intel-based computers
- Language
- Assembly language for intel-based computers
- Yule george
- Answer key
- An introduction to applied linguistics norbert schmitt
- Artificial language
- Linguistics as a scientific study of language
- Definition of language in linguistics
- Signing naturally names and tidbits answers
- Computational thinking algorithms and programming
- Using mathematics and computational thinking
- Cs 514 purdue
- Computational engineering and physical modeling