Postgraduate Diploma in Translation Lecture 1 Computers and

  • Slides: 29
Download presentation
Postgraduate Diploma in Translation Lecture 1 Computers and Language Feb 2005 -- MR Diploma

Postgraduate Diploma in Translation Lecture 1 Computers and Language Feb 2005 -- MR Diploma in Translation - Lecture 1

Course Information l l Web http: //www. cs. um. edu. mt/~mros/diptran Lecturers mike. rosner@um.

Course Information l l Web http: //www. cs. um. edu. mt/~mros/diptran Lecturers mike. rosner@um. edu. mt ray. fabri@um. edu. mt D. Arnold et al (1994) Machine Translation: an Introductory Guide. See website. H. Somers (2003). Computers and Translation, a Translator’s Guide. See website. Feb 2005 -- MR Diploma in Translation - Lecture 1 2

Computers and Language l Computational Linguistics l l l Natural Language Processing l l

Computers and Language l Computational Linguistics l l l Natural Language Processing l l Emphasis on mechanised linguistic theories. Grew out of early Machine Translation efforts Computational models of language analysis, interpretation, and generation. Language Engineering l l emphasis on large-scale performance example: Google Feb 2005 -- MR Diploma in Translation - Lecture 1 3

CL: Two Main Disciplines LINGUISTICS Feb 2005 -- MR COMP SCI Diploma in Translation

CL: Two Main Disciplines LINGUISTICS Feb 2005 -- MR COMP SCI Diploma in Translation - Lecture 1 4

Linguistics l l l Phonetics: The study of speech sounds Phonology: The study of

Linguistics l l l Phonetics: The study of speech sounds Phonology: The study of sound systems Morphology: The study of word structure Syntax: The study of sentence structure Semantics: The study of meaning Pragmatics: The study of language use Feb 2005 -- MR Diploma in Translation - Lecture 1 5

Grammar Rules: Prescriptive versus Descriptive Prescriptive Grammar l l Rules for and against certain

Grammar Rules: Prescriptive versus Descriptive Prescriptive Grammar l l Rules for and against certain uses Proscribed forms that are in current use “don’t end a sentence with a preposition” Subjective Descriptive Grammar l l l Feb 2005 -- MR Rules characterizing what people actually say Goal to characterize all and only that which speakers find acceptable Objective Diploma in Translation - Lecture 1 6

Noam Chomsky l l l Noam Chomsky’s work in the 1950 s radically changed

Noam Chomsky l l l Noam Chomsky’s work in the 1950 s radically changed linguistics, making syntax central. Chomsky has been the dominant figure in linguistics ever since. Chomsky invented the generative approach to grammar. Feb 2005 -- MR Diploma in Translation - Lecture 1 7

Generative Grammar: Key Points l l l A language is a (possibly infinite) set

Generative Grammar: Key Points l l l A language is a (possibly infinite) set of sentences. Grammar is finite. Grammar of a particular language expresses linguistic knowledge of that language Theory of Grammar includes mathematical definition of what a grammar is. The “Theory of Grammar” is a theory of human linguistic abilities. [source: Sag & Wasow] Feb 2005 -- MR Diploma in Translation - Lecture 1 8

Theories of Sentence and Word Structure: Rewrite Rules l l Rules can be used

Theories of Sentence and Word Structure: Rewrite Rules l l Rules can be used to specify the sentences of a language. Rules have the form LHS RHS l l l LHS may be a sequence of symbols RHS may be a sequence of symbols or words. Lexicon specifies words and their categories Feb 2005 -- MR Diploma in Translation - Lecture 1 9

A Simple Grammar/Lexicon grammar: S NP VP NP N VP V NP lexicon: V

A Simple Grammar/Lexicon grammar: S NP VP NP N VP V NP lexicon: V kicks N John N Bill Feb 2005 -- MR S NP N VP V NP N John Diploma in Translation - Lecture 1 kicks Bill 10

Formal v. Natural Languages Formal Languages Natural Languages l Arithmetic 3290 1 1010101 l

Formal v. Natural Languages Formal Languages Natural Languages l Arithmetic 3290 1 1010101 l English John saw the dog l Logic x man(x) mortal(x) l German Johann hat den hund gesehen l URL http: //www. cs. um. edu. mt l Maltese Ġianni ra kelb Feb 2005 -- MR Diploma in Translation - Lecture 1 11

Points of Similarity l l l A language is considered to be a (possibly

Points of Similarity l l l A language is considered to be a (possibly infinite) set of sentences. Sentences are sequences of words. Rules determine which sequences are valid sentences. Sentences have a definite structure. Sentence structure related to meaning. Feb 2005 -- MR Diploma in Translation - Lecture 1 12

Points of Difference Formal Languages l The grammar defines the language l Restricted application

Points of Difference Formal Languages l The grammar defines the language l Restricted application l Non ambiguous Feb 2005 -- MR Natural Languages l The language defines the grammar l Universal application l Highly ambiguous Diploma in Translation - Lecture 1 13

Ambiguity l l l Morphological Ambiguity en-large-ment Lexical Ambiguity the sheep is in the

Ambiguity l l l Morphological Ambiguity en-large-ment Lexical Ambiguity the sheep is in the pen Syntactic Ambiguity small animals and children laugh Semantic Ambiguity every girl loves a sailor Pragmatic Ambiguity can you pass the salt? The management of ambiguity is central to the success of CL in general and MT in particular. Feb 2005 -- MR Diploma in Translation - Lecture 1 14

Computer Science l The study of basic concepts l l l Information Data Algorithm

Computer Science l The study of basic concepts l l l Information Data Algorithm Program The application of these concepts to practical tasks. Implementation of computational models from other fields. Feb 2005 -- MR Diploma in Translation - Lecture 1 15

Information l l Information is an theoretical concept invented by Shannon in 1948 to

Information l l Information is an theoretical concept invented by Shannon in 1948 to measure uncertainty. The units of this measure are called bits. l Length – metres l Weight – kilos l Information – bits 1 bit is the amount of uncertainty inherent to a situation when there are exactly two possible outcomes. Example: for breakfast I will have coffee or I will have tea (nothing else). When I tell you that I have tea, I have conveyed one bit of information. The greater the number of possible outcomes, the more bits of infomation involved in the statement that indicates the actual outcome. Feb 2005 -- MR Diploma in Translation - Lecture 1 16

Data l l A formalized representation of facts or concepts suitable for communication, interpretation,

Data l l A formalized representation of facts or concepts suitable for communication, interpretation, or processing by people or automated means. Example: a telephone directory Unlike information, which is abstract, data is concrete Data has a certain level of structure. In the telephone directory, for example, we have the structure of a list of entries, each of which has a name, an address, and a number. Feb 2005 -- MR Diploma in Translation - Lecture 1 17

Algorithm l l A well defined procedure for the solution of a given problem

Algorithm l l A well defined procedure for the solution of a given problem in a finite number of steps Abstract Designed to perform a well-defined task. Finite description length. Guaranteed to terminate. Feb 2005 -- MR Diploma in Translation - Lecture 1 18

Algorithm for Chocolate Cake Feb 2005 -- MR Diploma in Translation - Lecture 1

Algorithm for Chocolate Cake Feb 2005 -- MR Diploma in Translation - Lecture 1 19

Program to Add X and Y Read X and Y X = 2, Y

Program to Add X and Y Read X and Y X = 2, Y = 3 subtract 1 from X add 1 to Y no Feb 2005 -- MR X = 0? Diploma in Translation - Lecture 1 yes Output Y 20

Computer Program l l A set of instructions, written in a specific programming language,

Computer Program l l A set of instructions, written in a specific programming language, which a computer follows in processing data, performing an operation, or solving a logical problem. Concrete A program can implement an algorithm. More than one program may implement the same algorithm. Not all programs express good algorithms! Feb 2005 -- MR Diploma in Translation - Lecture 1 21

Instructions vs. Execution Steps 1. 2. 3. 4. 5. Read X Read Y X

Instructions vs. Execution Steps 1. 2. 3. 4. 5. Read X Read Y X = X-1 Y = Y+1 If X = 0 then Print(X) else goto 3 How many instructions? How many execution steps? Feb 2005 -- MR Diploma in Translation - Lecture 1 22

Algorithms and Linguistics l l Does linguistic theory make sense without implementing the concepts?

Algorithms and Linguistics l l Does linguistic theory make sense without implementing the concepts? Linguistic theory provides linguistic knowledge in the form of l l l grammar rules theories about grammar rules Putting knowledge to some use involves processing issues: l l parsing generation Feb 2005 -- MR Diploma in Translation - Lecture 1 23

Computational Linguistics – Issues l l l How are a grammar and a lexicon

Computational Linguistics – Issues l l l How are a grammar and a lexicon represented? How is the structure of a given sentence actually discovered? How can we actually generate a sentence to express a particular meaning? How can linguistic theory be made concrete enough to test algorithmically? Can an artificial system learn a language with limited exposure to grammatical sentences? Feb 2005 -- MR Diploma in Translation - Lecture 1 24

Non computational theories can be misleading l l l Representational details omitted. Computer memory

Non computational theories can be misleading l l l Representational details omitted. Computer memory requirements omitted. Nature of individual steps may be unclear. Difficult to test. Potentially unimplementable Feb 2005 -- MR Diploma in Translation - Lecture 1 25

Example of a Non Computational Model Feb 2005 -- MR Diploma in Translation -

Example of a Non Computational Model Feb 2005 -- MR Diploma in Translation - Lecture 1 26

Computers and Language Twin Goals l Scientific Goal: Contribute to Linguistics by adding a

Computers and Language Twin Goals l Scientific Goal: Contribute to Linguistics by adding a computational dimension. l Technological Goal: Develop machinery capable of handling human language that can support “language engineering” Feb 2005 -- MR Diploma in Translation - Lecture 1 27

Computers and Language Tools & Resources l l l Grammar Formalisms, e. g. Definite

Computers and Language Tools & Resources l l l Grammar Formalisms, e. g. Definite Clause Grammars Parsing Algorithms sentence structure Generation Algorithms structure sentence Statistical Methods Linguistic Corpora Feb 2005 -- MR Diploma in Translation - Lecture 1 28

Computers and Language: Applications l l l Information Retrieval/Extraction Document Classification Question Answering Style

Computers and Language: Applications l l l Information Retrieval/Extraction Document Classification Question Answering Style and Spell Checking Integrated Multimodal Tasks Machine Translation Feb 2005 -- MR Diploma in Translation - Lecture 1 29