CMSC 723 LING 645 Intro to Computational Linguistics

  • Slides: 47
Download presentation
CMSC 723 / LING 645: Intro to Computational Linguistics September 1, 2004: Dorr Overview,

CMSC 723 / LING 645: Intro to Computational Linguistics September 1, 2004: Dorr Overview, History, Goals, Problems, Techniques; Intro to MT (J&M 1, 21) Prof. Bonnie J. Dorr Dr. Christof Monz TA: Adam Lee

Administrivia http: //www. umiacs. umd. edu/~christof/courses/cmsc 723 -fall 04/ IMPORTANT: • For Today: Chapters

Administrivia http: //www. umiacs. umd. edu/~christof/courses/cmsc 723 -fall 04/ IMPORTANT: • For Today: Chapters 1 and 21 • For Next Time: Chapter 2

Other Important Stuff ¬ This course is interdisciplinary—cuts across different areas of expertise. Expect

Other Important Stuff ¬ This course is interdisciplinary—cuts across different areas of expertise. Expect that a subset of the class will be learning new material at any time, while others will have to be patient! (The subsets will swap frequently!) ¬ Project 1 and Project 2 are designed differently. Be prepared for this distinction! – P 1 will focus on the fundamentals, getting your feet with software. By the end, you should feel comfortable using/testing certain types of NLP software. – P 2 will require a significantly deeper level of understanding, critique, analysis. You’ll be expected to think deeply and write a lot in the second project. What you write will be a major portion of the grade! ¬ No solutions will be handed out. Written comments will be sent to you by the TA. ¬ All email correspondence MUST HAVE “CMSC 723” in the Subject line!!! ¬ Submission format for assignments, projects: plain ascii, pdf ¬ Assignment 1 will be posted next week.

CL vs NLP Why “Computational Linguistics (CL)” rather than “Natural Language Processing” (NLP)? •

CL vs NLP Why “Computational Linguistics (CL)” rather than “Natural Language Processing” (NLP)? • Computational Linguistics — Computers dealing with language — Modeling what people do • Natural Language Processing —Applications on the computer side

Relation of CL to Other Disciplines Artificial Intelligence (AI) (notions of rep, search, etc.

Relation of CL to Other Disciplines Artificial Intelligence (AI) (notions of rep, search, etc. ) Machine Learning (particularly, probabilistic or statistic ML techniques) Human Computer Interaction (HCI) Electrical Engineering (EE) (Optical Character Recognition) Linguistics (Syntax, Semantics, etc. ) CL Psychology Philosophy of Language, Formal Logic Theory of Computation Information Retrieval

A Sampling of “Other Disciplines” ¬ Linguistics: formal grammars, abstract characterization of what is

A Sampling of “Other Disciplines” ¬ Linguistics: formal grammars, abstract characterization of what is to be learned. ¬ Computer Science: algorithms for efficient learning or online deployment of these systems in automata. ¬ Engineering: stochastic techniques for characterizing regular patterns for learning and ambiguity resolution. ¬ Psychology: Insights into what linguistic constructions are easy or difficult for people to learn or to use

History: 1940 -1950’s ¬ Development of formal language theory (Chomsky, Kleene, Backus). – Formal

History: 1940 -1950’s ¬ Development of formal language theory (Chomsky, Kleene, Backus). – Formal characterization of classes of grammar (context-free, regular) – Association with relevant automata ¬ Probability theory: language understanding as decoding through noisy channel (Shannon) – Use of information theoretic concepts like entropy to measure success of language models.

1957 -1983 Symbolic vs. Stochastic ¬ Symbolic – Use of formal grammars as basis

1957 -1983 Symbolic vs. Stochastic ¬ Symbolic – Use of formal grammars as basis for natural language processing and learning systems. (Chomsky, Harris) – Use of logic and logic based programming for characterizing syntactic or semantic inference (Kaplan, Kay, Pereira) – First toy natural language understanding and generation systems (Woods, Minsky, Schank, Winograd, Colmerauer) – Discourse Processing: Role of Intention, Focus (Grosz, Sidner, Hobbs) ¬ Stochastic Modeling – Probabilistic methods for early speech recognition, OCR (Bledsoe and Browning, Jelinek, Black, Mercer)

1983 -1993: Return of Empiricism ¬Use of stochastic techniques for part of speech tagging,

1983 -1993: Return of Empiricism ¬Use of stochastic techniques for part of speech tagging, parsing, word sense disambiguation, etc. ¬Comparison of stochastic, symbolic, more or less powerful models for language understanding and learning tasks.

1993 -Present ¬Advances in software and hardware create NLP needs for information retrieval (web),

1993 -Present ¬Advances in software and hardware create NLP needs for information retrieval (web), machine translation, spelling and grammar checking, speech recognition and synthesis. ¬Stochastic and symbolic methods combine for real world applications.

Language and Intelligence: Turing Test ¬ Turing test: – machine, human, and human judge

Language and Intelligence: Turing Test ¬ Turing test: – machine, human, and human judge ¬ Judge asks questions of computer and human. – Machine’s job is to act like a human, human’s job is to convince judge that he’s not the machine. – Machine judged “intelligent” if it can fool judge. ¬ Judgement of “intelligence” linked to appropriate answers to questions from the system.

ELIZA ¬Remarkably simple “Rogerian Psychologist” ¬Uses Pattern Matching to carry on limited form of

ELIZA ¬Remarkably simple “Rogerian Psychologist” ¬Uses Pattern Matching to carry on limited form of conversation. ¬Seems to “Pass the Turing Test!” (Mc. Corduck, 1979, pp. 225 -226) ¬Eliza Demo: http: //www. lpa. co. uk/pws_dem 4. htm

What’s involved in an “intelligent” Answer? Analysis: Decomposition of the signal (spoken or written)

What’s involved in an “intelligent” Answer? Analysis: Decomposition of the signal (spoken or written) eventually into meaningful units. This involves …

Speech/Character Recognition ¬Decomposition into words, segmentation of words into appropriate phones or letters ¬Requires

Speech/Character Recognition ¬Decomposition into words, segmentation of words into appropriate phones or letters ¬Requires knowledge of phonological patterns: – I’m enormously proud. – I mean to make you proud.

Morphological Analysis ¬Inflectional – duck + s = [N duck] + [plural s] –

Morphological Analysis ¬Inflectional – duck + s = [N duck] + [plural s] – duck + s = [V duck] + [3 rd person s] ¬Derivational – kind, kindness ¬Spelling changes – drop, dropping – hide, hiding

Syntactic Analysis ¬Associate constituent structure with string ¬Prepare for semantic interpretation S OR: NP

Syntactic Analysis ¬Associate constituent structure with string ¬Prepare for semantic interpretation S OR: NP I VP V watched watch Subject NP det I Object terrapin N Det the terrapin the

Semantics ¬A way of representing meaning ¬Abstracts away from syntactic structure ¬Example: – First-Order

Semantics ¬A way of representing meaning ¬Abstracts away from syntactic structure ¬Example: – First-Order Logic: watch(I, terrapin) – Can be: “I watched the terrapin” or “The terrapin was watched by me” ¬Real language is complex: – Who did I watch?

Lexical Semantics The Terrapin, is who I watched. Watch the Terrapin is what I

Lexical Semantics The Terrapin, is who I watched. Watch the Terrapin is what I do best. *Terrapin is what I watched the I= experiencer Watch the Terrapin = predicate The Terrapin = patient

Compositional Semantics ¬Association of parts of a proposition with semantic roles ¬Scoping Proposition Experiencer

Compositional Semantics ¬Association of parts of a proposition with semantic roles ¬Scoping Proposition Experiencer I (1 st pers, sg) Predicate: Be (perc) pred saw patient the Terrapin

Word-Governed Semantics ¬Any verb can add “able” to form an adjective. – I taught

Word-Governed Semantics ¬Any verb can add “able” to form an adjective. – I taught the class. The class is teachable – I rejected the idea. The idea is rejectable. ¬Association of particular words with specific semantic forms. – John (masculine) – The boys ( masculine, plural, human)

Pragmatics ¬ Real world knowledge, speaker intention, goal of utterance. ¬ Related to sociology.

Pragmatics ¬ Real world knowledge, speaker intention, goal of utterance. ¬ Related to sociology. ¬ Example 1: – Could you turn in your assignments now (command) – Could you finish the homework? (question, command) ¬ Example 2: – I couldn’t decide how to catch the crook. Then I decided to spy on the crook with binoculars. – To my surprise, I found out he had them too. Then I knew to just follow the crook with binoculars. [ the crook [with binoculars]] [ the crook] [ with binoculars]

Discourse Analysis ¬Discourse: How propositions fit together in a conversation—multi-sentence processing. – Pronoun reference:

Discourse Analysis ¬Discourse: How propositions fit together in a conversation—multi-sentence processing. – Pronoun reference: The professor told the student to finish the assignment. He was pretty aggravated at how long it was taking to pass it in. – Multiple reference to same entity: George W. Bush, president of the U. S. – Relation between sentences: John hit the man. He had stolen his bicycle

NLP Pipeline speech text Phonetic Analysis OCR/Tokenization Morphological analysis Syntactic analysis Semantic Interpretation Discourse

NLP Pipeline speech text Phonetic Analysis OCR/Tokenization Morphological analysis Syntactic analysis Semantic Interpretation Discourse Processing

Relation to Machine Translation analysis input generation output Morphological analysis Morphological synthesis Syntactic analysis

Relation to Machine Translation analysis input generation output Morphological analysis Morphological synthesis Syntactic analysis Syntactic realization Semantic Interpretation Lexical selection Interlingua

Ambiguity I made her duck I made duckling for her I made the duckling

Ambiguity I made her duck I made duckling for her I made the duckling belonging to her I created the duck she owns I forced her to lower head By magic, I changed her into a duck

Syntactic Disambiguation ¬Structural ambiguity: S NP I S VP V NP NP VP made

Syntactic Disambiguation ¬Structural ambiguity: S NP I S VP V NP NP VP made her V duck I VP V NP made det N her duck

Part of Speech Tagging and Word Sense Disambiguation ¬[verb Duck ] ! [noun Duck]

Part of Speech Tagging and Word Sense Disambiguation ¬[verb Duck ] ! [noun Duck] is delicious for dinner ¬I went to the bank to deposit my check. I went to the bank to look out at the river. I went to the bank of windows and chose the one dealing with last names beginning with “d”.

Resources for NLP Systems • Dictionary • Morphology and Spelling Rules • Grammar Rules

Resources for NLP Systems • Dictionary • Morphology and Spelling Rules • Grammar Rules • Semantic Interpretation Rules • Discourse Interpretation Natural Language processing involves (1) learning or fashioning the rules for each component, (2) embedding the rules in the relevant automaton, (3) and using the automaton to efficiently process the input.

Some NLP Applications ¬ Machine Translation—Babelfish (Alta Vista): http: //babelfish. altavista. com/translate. dyn ¬

Some NLP Applications ¬ Machine Translation—Babelfish (Alta Vista): http: //babelfish. altavista. com/translate. dyn ¬ Question Answering—Ask Jeeves (Ask Jeeves): http: //www. ask. com/ ¬ Language Summarization—MEAD (U. Michigan): http: //www. summarization. com/mead ¬ Spoken Language Recognition— Edu. Speak (SRI): http: //www. eduspeak. com/ ¬ Automatic Essay evaluation—E-Rater (ETS): http: //www. ets. org/research/erater. html ¬ Information Retrieval and Extraction—Net. Owl (SRA): http: //www. netowl. com/extractor_summary. html

What is MT? ¬Definition: Translation from one natural language to another by means of

What is MT? ¬Definition: Translation from one natural language to another by means of a computerized system ¬Early failures ¬Later: varying degrees of success

An Old Example The spirit is willing but the flesh is weak The vodka

An Old Example The spirit is willing but the flesh is weak The vodka is good but the meat is rotten

Machine Translation History ¬ 1950’s: Intensive research activity in MT ¬ 1960’s: Direct word-for-word

Machine Translation History ¬ 1950’s: Intensive research activity in MT ¬ 1960’s: Direct word-for-word replacement ¬ 1966 (ALPAC): NRC Report on MT ¬ Conclusion: MT no longer worthy of serious scientific investigation. ¬ 1966 -1975: `Recovery period’ ¬ 1975 -1985: Resurgence (Europe, Japan) ¬ 1985 -present: Resurgence (US) http: //ourworld. compuserve. com/homepages/WJHutchins/MTS-93. htm.

What happened between ALPAC and Now? ¬ Need for MT and other NLP applications

What happened between ALPAC and Now? ¬ Need for MT and other NLP applications confirmed ¬ Change in expectations ¬ Computers have become faster, more powerful ¬ WWW ¬ Political state of the world ¬ Maturation of Linguistics ¬ Development of hybrid statistical/symbolic approaches

Three MT Approaches: Direct, Transfer, Interlingual Semantic Composition Semantic Analysis Syntactic Structure Word Structure

Three MT Approaches: Direct, Transfer, Interlingual Semantic Composition Semantic Analysis Syntactic Structure Word Structure Morphological Analysis Source Text Semantic Structure Interlingua Semantic Transfer Syntactic Transfer Direct Semantic Decomposition Semantic Structure Semantic Generation Syntactic Structure Syntactic Generation Word Structure Morphological Generation Target Text

Examples of Three Approaches ¬Direct: – I checked his answers against those of the

Examples of Three Approaches ¬Direct: – I checked his answers against those of the teacher → Yo comparé sus respuestas a las de la profesora – Rule: [check X against Y] → [comparar X a Y] ¬Transfer: – Ich habe ihn gesehen → I have seen him – Rule: [clause agt aux obj pred] → [clause agt aux pred ob ¬Interlingual: – I like Mary→ Mary me gusta a mí – Rep: [Be. Ident (I [ATIdent (I, Mary)] Like+ingly)]

MT Systems: 1964 -1990 ¬Direct: GAT [Georgetown, 1964], TAUM-METEO [Colmerauer et al. 1971] ¬Transfer:

MT Systems: 1964 -1990 ¬Direct: GAT [Georgetown, 1964], TAUM-METEO [Colmerauer et al. 1971] ¬Transfer: GETA/ARIANE [Boitet, 1978] LMT [Mc. Cord, 1989], METAL [Thurmair, 1990], Mi. Mo [Arnold & Sadler, 1990], … ¬Interlingual: MOPTRANS [Schank, 1974], KBMT [Nirenburg et al, 1992], UNITRAN [Dorr, 1990]

Statistical MT and Hybrid Symbolic/Stats MT: 1990 -Present Candide [Brown, 1990, 1992]; Halo/Nitrogen [Langkilde

Statistical MT and Hybrid Symbolic/Stats MT: 1990 -Present Candide [Brown, 1990, 1992]; Halo/Nitrogen [Langkilde and Knight, 1998], [Yamada and Knight, 2002]; GHMT [Dorr and Habash, 2002]; DUSTer [Dorr et al. 2002]

Direct MT: Pros and Cons ¬ Pros – – Fast Simple Inexpensive No translation

Direct MT: Pros and Cons ¬ Pros – – Fast Simple Inexpensive No translation rules hidden in lexicon ¬ Cons – – – Unreliable Not powerful Rule proliferation Requires too much context Major restructuring after lexical substitution

Transfer MT: Pros and Cons ¬Pros – Don’t need to find language-neutral rep –

Transfer MT: Pros and Cons ¬Pros – Don’t need to find language-neutral rep – Relatively fast ¬Cons – N 2 sets of transfer rules: Difficult to extend – Proliferation of language-specific rules in lexicon and syntax – Cross-language generalizations lost

Interlingual MT: Pros and Cons ¬ Pros – Portable (avoids N 2 problem) –

Interlingual MT: Pros and Cons ¬ Pros – Portable (avoids N 2 problem) – Lexical rules and structural transformations stated more simply on normalized representation – Explanatory Adequacy ¬ Cons – Difficult to deal with terms on primitive level: universals? – Must decompose and reassemble concepts – Useful information lost (paraphrase)

Approximate IL Approach ¬Tap into richness of TL resources ¬Use some, but not all,

Approximate IL Approach ¬Tap into richness of TL resources ¬Use some, but not all, components of IL representation ¬Generate multiple sentences that are statistically pared down

Approximating IL: Handling Divergences ¬Primitives ¬Semantic Relations ¬Lexical Information

Approximating IL: Handling Divergences ¬Primitives ¬Semantic Relations ¬Lexical Information

Interlingual vs. Approximate IL ¬ Interlingual MT: – – primitives & relations bi-directional lexicons

Interlingual vs. Approximate IL ¬ Interlingual MT: – – primitives & relations bi-directional lexicons analysis: compose IL generation: decompose IL ¬ Approximate IL – hybrid symbolic/statistical design – overgeneration with statistical ranking – uses dependency rep input and structural expansion for “deeper” overgeneration

Mapping from Input Dependency to English Dependency Tree Mary le dio patadas a John

Mapping from Input Dependency to English Dependency Tree Mary le dio patadas a John → Mary kicked John GIVEV Agent MARY Theme KICKN KICKV [CAUSE GO] Goal JOHN Agent MARY [CAUSE GO] Goal JOHN Knowledge Resources in English only: (LVD; Dorr, 2001).

Statistical Extraction Mary Mary Mary kicked John. [-0. 670270 ] gave a kick at

Statistical Extraction Mary Mary Mary kicked John. [-0. 670270 ] gave a kick at John. [-2. 175831] gave the kick at John. [-3. 969686] gave an kick at John. [-4. 489933] gave a kick by John. [-4. 803054] gave a kick to John. [-5. 045810] gave a kick into John. [-5. 810673] gave a kick through John. [-5. 836419] gave a foot wound by John. [-6. 041891] gave John a foot wound. [-6. 212851]

Benefits of Approximate IL Approach ¬Explaining behaviors that appear to be statistical in nature

Benefits of Approximate IL Approach ¬Explaining behaviors that appear to be statistical in nature ¬“Re-sourceability”: Re-use of already existing components for MT from new languages. ¬Application to monolingual alternations

What Resources are Required? ¬Deep TL resources ¬Requires SL parser and tralex ¬TL resources

What Resources are Required? ¬Deep TL resources ¬Requires SL parser and tralex ¬TL resources are richer: LVD representations, Cat. Var database ¬Constrained overgeneration