Tokens typedef enum bookkeeping tokens ENDFILE ERROR Typedef
● Tokens(记号)的枚举表示 typedef enum /* book-keeping tokens */ {ENDFILE, ERROR, 每一个记号的表示: Typedef struct { Token. Type tokenval; char *stringval; int numval; } Token. Record /* reserved words */ IF, THEN, ELSE, END, REPEAT, UNTIL, READ, WRITE, /* multicharacter tokens */ ID, NUM, /* special symbols */ ASSIGN, EQ, LT, PLUS, MINUS, TIMES, OVER, LPAREN, RPAREN, SEMI } Token. Type; 11/1/2020 北京化 大学信息科学与技术学院计算机系 9
3. 2 输入缓冲 例:C的源代码:a[index] = 4 + 2 The scanner (Lexical analysis ): Token. Type get. Token(void) a [ i n d e x ] = 4 + 2 11/1/2020 北京化 大学信息科学与技术学院计算机系 12
3. 3. 3 Definition of Regular Expression 1. Basic Regular Expression 基本正则表达式 ◆ The single character from the alphabet, expression a matches the character a. L(a) = {a} ◆ empty string(ε): the string contains no characters. L(ε) = {ε} ◆ {} or Φ: matches no string at all , whose language is the empty set. L(Φ) = { } 11/1/2020 北京化 大学信息科学与技术学院计算机系 14
2. Regular Expression Operation 正则表达式的运算 ● | —— Choice among alternatives 或(选择) L(r|s) = L(r ) ∪ L(s ) L(a|b|c|d) = {a, b, c, d} ● • —— Concatenation 连接(可省) L(rs) = L(r)L(s ) ● * 注意:一般rs ≠sr,而εr=rε=r —— Repetition or “closure”闭包(重复) S* = {ε}∪S∪SS∪SSS∪ ……… L(r*) = L(r)* 例:若S= a|bb, (a|bb)* =?L((a|bb)*)=? (a|bb)* =ε, a, bb, aa, abb, bba, bbbb, aaa, aabb, ……… L((a|bb)*)=L(a|bb)*={a|bb}*={ε, a, bb, aa, abb, bba, bbbb, aaa, a abb, abba , abbbb, 北京化 大学信息科学与技术学院计算机系 bbaa, ……} 11/1/2020 15
● Precedence of Operation and use of parentheses 运算符的优先级和括号的使用 Precedence: * the first (先*,其次 • ,最后 |) • the second | the third 例:a|bc* ≌ a|(b(c*)) ab|c*d ≌ (ab)|((c*)d) ● Names for regular expression 正则表达式的名字 (0|1|2|3……|9)* ≌ digit = 0|1|2|3|4……|9 digit* 11/1/2020 Precedence of Operation运算符的优先级 16 北京化 大学信息科学与技术学院计算机系
3. Example 1)∑={ a, b, c} the set of all strings over this alphabet that contain exactly one b. (∑上只包括一个b的所有串的集合) (a|c)*b(a|c)* 2) ∑={ a, b, c} the set of all strings that contain at most one b. (∑上最多包括一个b的所有串的集合) (a|c)*|(a|c)*b(a|c)* (a|c)*(b|ε)(a|c)* 3) ∑={ a, b} the set of strings S consists of a single b surrounded by the same number of a’s. (∑上由一个b及在其前后有相同数 目的a组成的串S的集合) S = {b, aba, aabaa, aaabaaa, ……} = { anban | n≠ 0} “regular expression can’t count ” 11/1/2020 北京化 大学信息科学与技术学院计算机系 17
3. Example 4) ∑={ a, b, c}the strings contain no two consecutive b’s (∑上任意 两个b都不相连的所有串的集合) ( not b | b not b )* ( b |ε) not b = a|c (a | c | ba | bc)* (b |ε) (a | c | ab| cb )* 5) ∑={ a, b, c}, Regular Expression : ((b|c)* a(b|c)*a)* (b|c)*, determine a concise English description of the language ( 正则表达式描述语言) ((b|c)* a(b|c)*a)* (b|c)* ≌ ( not a* a)* not a* the strings contain an even number of a’s 偶数个a的串的语言 11/1/2020 北京化 大学信息科学与技术学院计算机系 18
3. 3. 4 Extensions to Regular Expressions 1. one or more repetitions 一个或多个重复(正闭包) r+ 2. any character 任意字符 period “.” 3. a range of characters 字符范围 [0 -9], [a-z. A-Z] 4. any character not in a given set 不在给定集合中的任意字符 (a|b|c) a character that is not either a or b or c [^abc] in Lex 5. optional subexpressions 可选的表达式 r? the strings matched by r are optional 11/1/2020 北京化 大学信息科学与技术学院计算机系 19
Regular Expressions for Programming Language Tokens 1. Numbers 数 nat = [0 -9]+ signed. Nat = (+|-)? nat number = signed. Nat(“.”nat)? (E signed. Nat)? 2. Reserved Words and Identifiers 保留字和标识符 reserved = if | while | do |……… letter = [a-z A-Z] digit = [0 -9] identifier = letter(letter|digit)* 3. Comments 注释 /* this is a C comment */ { this is a pascal comment } 4. Ambiguity, White Space, and Lookahead 二义性、空格、回溯 11/1/2020 北京化 大学信息科学与技术学院计算机系 20
3. 4. 2 Finite Automata有穷自动机 Finite automata( finite-state machines) are a mathematical way of describing particular kinds of algorithms. v Definite of Deterministic finite automation(DFA) 确定有穷自动机 v Nondeterministic finite automation(NFA) 非确定有穷自动机 11/1/2020 北京化 大学信息科学与技术学院计算机系 24
3. 4. 2. 1 Definite of Deterministic finite automation(DFA) 确定有穷自动机 ● DFA(deterministic finite automation)M : ◆ an alphabet ∑ 输入字母表(终极符集合) ◆ a set of states S 有穷状态集(非终极符集合) “确定” 即状态转 移函数是 单值函数 ◆ a transition function T : S ×∑ → S (状态转换函数) ◆ a start state s 0∈S 唯一的初始状态 ◆ a set of accepting states A S 终止状态集 由M接受的语言L(M) ={c 1 c 2 c 3…. cn | ci ∈∑, s 1= T(s 0, c 1), s 2= T(s 1, c 2), …, sn = T(sn-1, cn) , sn ∈ A. } 11/1/2020 北京化 大学信息科学与技术学院计算机系 25
● Example 3) digit = [0 -9] nat = digit + signed. Nat = (+|-)? Nat Number = singed. Nat(“. ”nat)? (E signed. Nat)? A DFA of nat: digit A DFA of signed. Nat: digit A DFA of Number : digit + digit 11/1/2020 + - digit E digit E 北京化 大学信息科学与技术学院计算机系 digit + - digit 27 digit
: = Return assign < = Return LE = : < = Return EQ = = Return assign DFA Return LE Return EQ < = Return LE < < > Return NE 11/1/2020 Return LT 北京化 大学信息科学与技术学院计算机系 30 NFA
< < = > Return LE Return NE NFA < Return LT < 11/1/2020 = > Return LE other Return LT Return NE 北京化 大学信息科学与技术学院计算机系 31 DFA
3. 4. 2. 3 Implementation of finite automata in Code 用代码实现有穷自动机 ● 识别标识符的DFA (方法 1): letter other 1 2 { starting in state 1 } if the next character is a letter then digit advance the input; { now in state 2 } while the next character is a letter or a digit do advance the input; { stay in state 2 } end while; { go to state 3 without advancing the input} accept; else { error or other cases } end if; 11/1/2020 北京化 大学信息科学与技术学院计算机系 32 3
● 识别标识符的DFA (方法 2): letter 1 letter 2 other digit 3 1 2 2 2 digit other Accepting No 2 3 3 No Yes state : = 1; ch : = next input character; while not Accept[stote] and not error(state) do newstate : = T[state, ch]; if Advance[state, ch] then ch : = next input character; state : = newstate; end while; if Accept[state] then accept; 11/1/2020 北京化 大学信息科学与技术学院计算机系 33
3. 5 From Regular Expression To DFAs 从正则表达式到DFA the algorithm : translating a regular expression into a DFA regular NFA DFA program 3. 5. 1 从正则表达式到NFA 1. (a)对于正则式φ, 所构造NFA: x (b)对于正则式ε, 所构造NFA: x (c)对于正则式a, a∈Σ, 则 NFA: x 11/1/2020 北京化 大学信息科学与技术学院计算机系 y ε a 34 y y
例:有NFA M,求DFA M’。 初态 a a start ε 1 a I=ε-closure({1})={1, 4} b 2 4 3 c c I Ia Ib Ic {1, 4} {2, 3} φ {2, 3} {2} {4} {3, 4} {2} {4} φ φ {3, 4} 11/1/2020 Ia=ε-closure(T(1, a)∪T(4, a)) = ε-closure({2, 3}∪φ) = ε-closure ({2, 3}) ={2, 3} Ib= ε-closure(T(1, b)∪T(4, b)) = ε-closure(φ) =φ Ic= ε-closure(T(1, c)∪T(4, c)) = φ I={2, 3}, Ia={2}, Ib={4}, Ic={3, 4}… … 北京化 大学信息科学与技术学院计算机系 48
a 1 2 3 4 b 6 7 1 4 3 3 5 6 5 7 6 4 7 4 3 1 2 a 1 区号 1 2 3 5 b b a 11/1/2020 a 1 6 2 7 3 1 4 4 5 7 6 4 7 4 b a b 区号 3 1 3 5 2 6 3 3 1 2 a 3 4 1 2 3 4 5 6 7 a 6 7 1 4 7 4 4 b 区号 3 1 3 5 2 6 3 3 4 1 5 2 将区号代替 状态号得: a b 1 2 a 3 b 4 4 2 b 5 北京化 大学信息科学与技术学院计算机系 5 1 3 55 2 4 5 2 1
例3:试求与下图所示NFA等价的化简了的DFA。 2 0 0 0, 1 1 1 3 1 0 0, 1 5 1 4 化简后的DFA: 0 1 23 0, 1 1 34 23 235 34 34 235 23 235 11/1/2020 235 1 23 0 23 1 34 235 34 34 235 23 235 345 345 235 345 北京化 大学信息科学与技术学院计算机系 57
NFA确定为DFA: 1234 24 a b 24 24 34568 568 345678 7 345678 68 7 68 68 345678 7 注:状态从1~8标注 11/1/2020 北京化 大学信息科学与技术学院计算机系 59
3. 6 Implementation of a tiny scanner TINY扫描程序的实现 ● The tokens and token classes of TINY Reserved Words Special Symbols Other if + number then - (1 or more digits) else * end / repeat = until < identifier read (1 or more letters) write ) ; {…} : Comments : = 11/1/2020 60 北京化 大学信息科学与技术学院计算机系
● The DFA 1 for the special symbols except assignment + - Return PLUS Return MINUS ; Return SEMI ● The DFA 2 ( numbers and identifiers ) digit Innum [other] digit letter [other] Start Inid Done + - * / = < ( ) 11/1/2020 北京化 大学信息科学与技术学院计算机系 61
● The DFA 3 (comments , white space, assignment ) digit white space Innum digit Start letter Inid letter [other] : { } [other] Inassign Done = [other] other Incomment other 11/1/2020 北京化 大学信息科学与技术学院计算机系 62
● Sample program in the TINY language { sample program In TINY language Computes factorial } read x; { input on integer } if 0 < x then { don't compute if x <= 0 } fact : = 1 ; repeat fact : = fact * x; x : = x - 1 until x = 0; write fact { output factorial of x } end 11/1/2020 北京化 大学信息科学与技术学院计算机系 63
● Output of scanner TINY COMPILATION: sample. tny 1: { Sample program 2: in TINY language – 3: computes factorial 4: } 5: read x; { input an integer } 5: reserved word: read 5: id, name= x 5: ; 6: if 0 < x then { don't compute if x <= 0 } 6: reserved word: if 6: mum, val= 0 6: < 6: id, name= x 6: reserved word: then 11/1/2020 7: fact : = 1; 7: id, name= fact 7: : = 7: num, val= 1 7: |; 8: repeat 8: reserved word: repeat 9: fact : = fact * x; 9: id, name= fact 9: : = 9: id, name= fact 9: * 9: id, name= x 9: ; 北京化 大学信息科学与技术学院计算机系 64
● Output of scanner 10: x : = x - 1 10: id, name= x 10: : = 10: id, name=x 10: mum, val = 1 11: until x = 0; 11: reserved word: until 11: id, name= x 11: = 11: mum, val= 0 11: ; 12: write fact { output factorial of x } 12: reserved words: write 12: id, name= fact 13: end 13: reserved word: end 14: EOF 11/1/2020 北京化 大学信息科学与技术学院计算机系 65
Thank you. Beijing University of Chemical Technology Beijing, P R China 11/1/2020 北京化 大学信息科学与技术学院计算机系 71
- Slides: 71