Computational Linguistics Introduction Finite State Machinery and Language
- Slides: 42
Computational Linguistics Introduction Finite State Machinery and Language Description Apr 2009 CLINT-LIN: Finite State Machinery
Acknowledgement The material for this lecture is derived from a series of talks given by Dr. Ken Beesley (Xerox European Research Centre, Grenoble) in Malta, 2001. Apr 2009 CLINT-LIN: Finite State Machinery
Today’s Topics • • Finite State Technology Regular Languages and Relations Review of Set Theory Understand the mathematical operations that can be performed on such Languages. • Understand how Languages, Relations, Regular Expressions, and Networks are interrelated. Apr 2009 CLINT-LIN: Finite State Machinery
What is Finite State Technology? • Finite State Technology refers to a collection of techniques for application of Finite State Automata (FSA) to a range of linguistically motivated problems. • Such Techniques include • Design of user languages for specifying FSA • Compilation of such languages into efficient transition networks. • Development environments and runtime systems Apr 2009 CLINT-LIN: Finite State Machinery
What is Finite-State Technology Good For? • Finite-state techniques cannot handle central embedding • the man the dog the cat bit followed ate. • They are well suited to “lower-level” natural language processing such as • Tokenization – what is the next word? • Spelling error detection: does the next word belong to a list? • Morphological/phonological analysis/generation • Shallow syntactic parsing and “chunking” Apr 2009 CLINT-LIN: Finite State Machinery
Tokenisation Problems Vf. B Stuttgart scored twice in quick success -ion early in the second half on their way to a deserved 2 -1 victory over Manchester United in the Champions League on Wednesday. (example from Mary Dalrymple, University of London) • • Vf. B Stuttgart, Manchester United succession 2 -1 Wednesday • Finite state techniques provide a means to specify the language of words, thus defining what it means to be the next token. • There are three ways to specify such languages Apr 2009 CLINT-LIN: Finite State Machinery
Languages, Notations and Machines LANGUAGE (set of strings) NOTATION Apr 2009 MACHINE CLINT-LIN: Finite State Machinery
Languages, Notations and Machines FINITE STATE LANGUAGE FINITE STATE NOTATION Apr 2009 FINITE STATE AUTOMATON CLINT-LIN: Finite State Machinery
FINITE STATE AUTOMATA: preliminary definition A finite state automaton includes: • A finite set of states • A finite set of labelled transitions between states Apr 2009 CLINT-LIN: Finite State Machinery
Physical Machines with Finite States The Lightswitch Machine UP OFF ON DOWN Apr 2009 CLINT-LIN: Finite State Machinery
Physical Machines with Finite States The Lightswitch Toggle Machine PUSH OFF ON PUSH Apr 2009 CLINT-LIN: Finite State Machinery
The Five Cent Machine Problem: • Assume you have one, two, and five cent pieces • Design a finite state automaton which accepts exactly 5 cents. Apr 2009 CLINT-LIN: Finite State Machinery
The Cola Machine • Need to enter 25 cents (USA) to get a drink • Accepts the following coins: • Nickel = 5 cents • Dime = 10 cents • Quarter = 25 cents • For simplicity, our machine needs exact change • We will model only the coin-accepting mechanism Apr 2009 CLINT-LIN: Finite State Machinery
Physical Machines with Finite States The Cola Machine Start State Final State N 0 N 5 D N N 10 15 D D Q Apr 2009 CLINT-LIN: Finite State Machinery N 20 25 D
The Cola Machine Language • List of all the sequences of coins accepted: • { Q, DDN, DND, NDD, DNNN, NDNN, NNDNNNND, NNNNN } • Think of the coins as SYMBOLS or CHARACTERS • The set of symbols accepted is the ALPHABET of the machine • Think of sequences of coins as WORDS or “strings” • The set of words accepted by the machine is its LANGUAGE Apr 2009 CLINT-LIN: Finite State Machinery
FINITE STATE AUTOMATA: better definition A finite state automaton includes: • A finite set of states • Initial State • Final State (s) • • A finite set of labelled transitions beween states Labels are symbols from an alphabet Recognises a language Generates a language as well! Apr 2009 CLINT-LIN: Finite State Machinery
A Network that Accepts a One Word Language c Apr 2009 a n t CLINT-LIN: Finite State Machinery o
A Network that Accepts a Three Word Language a n t i g r o c t m Apr 2009 e s CLINT-LIN: Finite State Machinery e a
Scaling Up the Network • Imagine the same network expanded to handle three million words, all of them corresponding to valid words of a given language. • We supply a word and ‘apply’ it to the network. If it is accepted by the network, then it is a valid word. Otherwise it does not belong to the language • This is the basis for a Spanish spelling error detector. Apr 2009 CLINT-LIN: Finite State Machinery
Looking Up a Word a n t i g r o c t m “Apply” Apr 2009 e s m e s a CLINT-LIN: Finite State Machinery e a
Lookup Failure Lookup succeeds when all input is consumed and final state is reached. Lookup can fail because: • Not all input is consumed ("libro", "tigra") • Input is fully consumed but state is not final ("cant") • Final state is reached but there is still unconsumed output ("mesas") Apr 2009 CLINT-LIN: Finite State Machinery
Shared Structure c l e a v e Apr 2009 CLINT-LIN: Finite State Machinery r e
Transducers “Lookdown” mesa+Noun+Fem+Pl m e s a +Noun m e s a 0 “Lookup” Apr 2009 m e s a s CLINT-LIN: Finite State Machinery +Fem 0 +Pl s
A Morphological Analyzer dog +n +pl Transducer dogs Apr 2009 CLINT-LIN: Finite State Machinery
A Morphological Analyzer Lexical Language Transducer Surface Language Apr 2009 CLINT-LIN: Finite State Machinery
A Quick Review of Set Theory A set is a collection of objects. B A D E We can enumerate the “members” or “elements” of finite sets: { A, D, B, E }. There is no significant order in a set, so { A, D, B, E } is the same set as { E, A, D, B }, etc. Apr 2009 CLINT-LIN: Finite State Machinery
Uniqueness of Elements You cannot have two or more ‘A’ elements in the same set B A D E { A, A, D, B, E} is just a redundant specification of the set { A, D, B, E }. Apr 2009 CLINT-LIN: Finite State Machinery
Cardinality of Sets The Empty Set: A Finite Set: Norway Denmark Sweden An Infinite Set: e. g. The Set of all Positive Integers Apr 2009 CLINT-LIN: Finite State Machinery
Simple Operations on Sets: Union A B D E C Set 1 Set 2 B C A D E Union of Set 1 and Set 2 Apr 2009 CLINT-LIN: Finite State Machinery
Simple Operations on Sets (2): Union A B C D C Set 1 Set 2 B C A D Union of Set 1 and Set 2 Apr 2009 CLINT-LIN: Finite State Machinery
Simple Operations on Sets (3): Intersection A B C D C Set 1 Set 2 C Intersection of Set 1 and Set 2 Apr 2009 CLINT-LIN: Finite State Machinery
Simple Operations on Sets (4): Subtraction A B C D C Set 1 Set 2 A B Set 1 minus Set 2 Apr 2009 CLINT-LIN: Finite State Machinery
Formal Languages Very Important Concept in Formal Language Theory: A Language is just a Set of Words. • We use the terms “word” and “string” interchangeably. • A Language can be empty, have finite cardinality, or be infinite in size. • You can union, intersect and subtract languages, just like any other sets. Apr 2009 CLINT-LIN: Finite State Machinery
Union of Languages (Sets) dog cat rat elephant mouse Language 1 Language 2 dog cat rat elephant mouse Union of Language 1 and Language 2 Apr 2009 CLINT-LIN: Finite State Machinery
Intersection of Languages (Sets) dog cat rat elephant mouse Language 1 Language 2 Intersection of Language 1 and Language 2 Apr 2009 CLINT-LIN: Finite State Machinery
Intersection of Languages (Sets) dog cat rat mouse Language 1 Language 2 rat Intersection of Language 1 and Language 2 Apr 2009 CLINT-LIN: Finite State Machinery
Subtraction of Languages (Sets) dog cat rat mouse Language 1 Language 2 dog cat Language 1 minus Language 2 Apr 2009 CLINT-LIN: Finite State Machinery
Languages • A language is a set of words (=strings). • Words (strings) are composed of symbols (letters) that are “concatenated” together. • At another level, words are composed of “morphemes”. • In most natural languages, we concatenate morphemes together to form whole words. For sets consisting of words (i. e. for Languages), the operation of concatenation is very important. Apr 2009 CLINT-LIN: Finite State Machinery
Concatenation of Languages work talk walk 0 ing Root Language Suffix Language working worked works talking talked talks walking walked walks Apr 2009 CLINT-LIN: Finite State Machinery ed s The concatenation of the Suffix language after the Root language.
Languages and Networks t 0 a s w a l o s k s r e Network/Language 1 i n g d Network/Language 2 t s 0 a w a o l s k i r e Apr 2009 n g The concatenation of Network 1 and Network 2 d CLINT-LIN: Finite State Machinery
Why is “Finite State” Computing so Interesting? • Finite-state systems are mathematically elegant, easily manipulated and modifiable. • Computationally efficient. Usually very compact. • The programming we linguists do is declarative. We describe the facts of our natural language; i. e. we write grammars. We do not hack ad hoc code. • The runtime code, which applies our systems to linguistic input, is already written and it is completely languageindependent. • Finite-state systems are inherently bidirectional: we can use the same system to analyze and to generate. Apr 2009 CLINT-LIN: Finite State Machinery
Languages, Notations and Machines FINITE STATE LANGUAGE FINITE STATE NOTATION Apr 2009 FINITE STATE MACHINE CLINT-LIN: Finite State Machinery
- Finite subordinate clauses
- What is finite verb
- Learning objectives for finite and non finite verbs
- Finite and nonfinite verbs examples
- Non finite forms of the verb qayda
- Chomsky computational linguistics
- Xkcd computational linguistics
- Computational linguistics olympiad
- Columbia computational linguistics
- Traditional linguistics and modern linguistics
- History of applied linguistics
- Introduction to machinery principles
- Introduction to finite element analysis and design
- Tcp header
- Contoh soal dan jawaban aturan produksi fsa
- Finite state machine sequential circuits
- Limitations of finite automata
- Finite state machine with datapath
- Finite state machine minimization
- Finite state automata (fsa) adalah
- Vhdl finite state machine
- Traffic light finite state machine
- Tcp connection management finite state machine
- Fsm vhdl
- Finite state machine elevator
- Fsa diagram
- Finite state machine vending machine example
- Dfa diagram generator
- Steady state error in control system
- Finite state machine
- Ospf finite state machine
- Finite state machine
- Finite state machine game
- State diagram
- Mesin fsa
- Aturan produksi untuk suatu tata bahasa regular
- Finite state machine vhdl testbench
- Finite state machine
- Finite state machine
- Deterministic finite state automata
- Gambarlah diagram transisi untuk nfa berikut
- Durum makinesi nedir
- Parole definition in linguistics