Shell CSCE 314 TAMU CSCE 314 Programming Languages

  • Slides: 36
Download presentation
Shell CSCE 314 TAMU CSCE 314: Programming Languages Dr. Dylan Shell Functional Parsers 1

Shell CSCE 314 TAMU CSCE 314: Programming Languages Dr. Dylan Shell Functional Parsers 1

Shell CSCE 314 TAMU What is a Parser? A parser is a program that

Shell CSCE 314 TAMU What is a Parser? A parser is a program that takes a text (set of tokens) and determines its syntactic structure. String or [Token] 2∗ 3+4 Parser syntactic structure Means Structure is made explicit in our data structure 2

Shell CSCE 314 TAMU The Parser Type In a functional language such as Haskell,

Shell CSCE 314 TAMU The Parser Type In a functional language such as Haskell, parsers can naturally be viewed as functions. type Parser = String → Tree A parser is a function that takes a string and returns some form of tree. However, a parser might not require all of its input string, so we also return any unused input: type Parser = String → (Tree, String) A string might be parsable in many ways, including none, so we generalize to a list of results: type Parser = String → [(Tree, String)] 3

Shell CSCE 314 TAMU Furthermore, a parser might not always produce a tree, so

Shell CSCE 314 TAMU Furthermore, a parser might not always produce a tree, so we generalize the result to a value of any type: type Parser res = String → [(res, String)] Finally, a parser might take token streams of symbols instead of character streams: type Token. Parser symb res = [symb] → [(res, [symb])] Note: For simplicity, we will only consider parsers that either fail and return the empty list of results, or succeed and return a singleton list. 4

Shell CSCE 314 TAMU Approach and Rationale We are going produce code that will

Shell CSCE 314 TAMU Approach and Rationale We are going produce code that will parse complex expressions. The code will be built (like most functional programming) by building from the bottom upwards. This requires a temporary shift in view because we look and think of parsed output from the top down. 5

Shell CSCE 314 TAMU Approach and Rationale We are going produce code that will

Shell CSCE 314 TAMU Approach and Rationale We are going produce code that will parse complex expressions. The code will be built (like most functional programming) by building from the bottom upwards. This requires a temporary shift in view because we look and think of parsed output from the top down. 6

Shell CSCE 314 TAMU Approach and Rationale 7

Shell CSCE 314 TAMU Approach and Rationale 7

Shell CSCE 314 TAMU �protocol�: : = http|https|ftp �host-domain-name�: : = �port�: : =

Shell CSCE 314 TAMU �protocol�: : = http|https|ftp �host-domain-name�: : = �port�: : = 0|1|2|⋯|65535 �host�. �sub-domain�� domain� �host�: : = �char�� char�* ⋮ �sub-domain�: : = �� char�*. �� sub-domain�|”” �domain�: : = com|edu|gov|⋯ 8

Shell CSCE 314 TAMU Approach and Rationale We are going produce code that will

Shell CSCE 314 TAMU Approach and Rationale We are going produce code that will parse complex expressions. The code will be built (like most functional programming) by building from the bottom upwards. This requires a temporary shift in view because we look and think of parsed output from the top down. ● ● ● We start with very simple parsers When we will glue these together in two ways: ○ One after the other: first an X then a Y afterwards ○ Choice: either an X or a Y. Then this process can be applied recursively. �protocol�, �domain�, etc. �sub-domain�� domain� 2|25|2555 9

Shell CSCE 314 TAMU Approach and Rationale We are going produce code that will

Shell CSCE 314 TAMU Approach and Rationale We are going produce code that will parse complex expressions. The code will be built (like most functional programming) by building from the bottom upwards. This requires a temporary shift in view because we look and think of parsed output from the top down. We Level start with very simple parsers High Overview ● When we will glue these together in two ways: We want to be able to write a parser ○ One after the other: first an X then a Y that closely resembles the grammatical afterwards structure the defines the language we’re ○ Choice: either described an X or a next, Y. is parsing. The approach ● involved, Then this butprocess it allowscan us tobedoapplied that! recursively. ● �protocol�, �domain�, etc. �sub-domain�� domain� 2|25|2555 1

Shell CSCE 314 TAMU Basic Parsers (Building Blocks) The parser symbol. Dot consumes the

Shell CSCE 314 TAMU Basic Parsers (Building Blocks) The parser symbol. Dot consumes the first character and succeeds if it symbol. Dot : : Parser Char is a dot, and fails otherwise: : : String -> [(Char, String)] : : [Char] -> [(Char, [Char])] symbol. Dot (x: xs) | (x == ‘. ’) = [(x, xs)] | otherwise = [] Example: *Main> symbol. Dot “. com” [(‘. ’, “com”)] 1

Shell CSCE 314 TAMU Basic Parsers (Building Blocks) The parser item fails if the

Shell CSCE 314 TAMU Basic Parsers (Building Blocks) The parser item fails if the input is empty, and consumes the first character otherwise: item : : : item = Example: Parser Char String -> [(Char, String)] [Char] -> [(Char, [Char])] inp -> case inp of [] -> [] (x: xs) -> [(x, xs)] *Main> item "parse this" [('p', "arse this")] 1

Shell CSCE 314 TAMU The parser return v always succeeds, returning the value v

Shell CSCE 314 TAMU The parser return v always succeeds, returning the value v without consuming any input: return : : a -> Parser a return v = inp -> [(v, inp)] The parser failure always fails: failure : : Parser a failure = inp -> [] Example: *Main> Main. return 7 "parse this" [(7, "parse this")] *Main> failure "parse this" [] 1

Shell CSCE 314 TAMU We can make it more explicit by letting the function

Shell CSCE 314 TAMU We can make it more explicit by letting the function parse apply a parser to a string: parse : : Parser a → String → [(a, String)] parse p inp = p inp -- essentially id function Example: *Main> parse item "parse this" [('p', "arse this")] 1

Shell CSCE 314 TAMU Choice What if we have to backtrack? First try to

Shell CSCE 314 TAMU Choice What if we have to backtrack? First try to parse p, then q? The parser p +++ q behaves as the parser p if p succeeds, and as the parser q otherwise. (+++) : : Parser a -> Parser p +++ q = inp -> case p inp [] [(v, out)] Example: a -> Parser a of -> parse q inp -> [(v, out)] *Main> parse failure "abc" [] *Main> parse (failure +++ item) "abc" [('a', "bc")] 1

Shell CSCE 314 TAMU Examples > parse item "" [] > parse item "abc"

Shell CSCE 314 TAMU Examples > parse item "" [] > parse item "abc" [('a', "bc")] > parse failure "abc" [] > parse (return 1) "abc" [(1, "abc")] > parse (item +++ return 'd') "abc" [('a', "bc")] > parse (failure +++ return 'd') "abc" [('d', "abc")] 1

Shell CSCE 314 TAMU Note: The library file Parsing. hs is available on the

Shell CSCE 314 TAMU Note: The library file Parsing. hs is available on the course webpage. The Parser type is a monad, a mathematical structure that has proved useful for modeling many different kinds of computations. 1

Sequencing Shell CSCE 314 TAMU Often we want to sequence parsers, e. g. ,

Sequencing Shell CSCE 314 TAMU Often we want to sequence parsers, e. g. , the following grammar: �if-stmt�: : = if (�expr�) then �stmt� First parse if, then (, then �expr�, … A sequence of parsers can be combined as a single composite parser using the keyword do. For example: p : : Parser (Char, Char) p = do x ← item y ← item return (x, y) Meaning: “The value of x is generated by the item parser. ” 1

Shell CSCE 314 TAMU Note: ● ● ● Each parser must begin in precisely

Shell CSCE 314 TAMU Note: ● ● ● Each parser must begin in precisely the same column. That is, the layout rule applies. The values returned by intermediate parsers are discarded by default, but if required can be named using the ← operator. The value returned by the last parser is the value returned by the sequence as a whole. 1

Shell CSCE 314 TAMU ● If any parser in a sequence of parsers fails,

Shell CSCE 314 TAMU ● If any parser in a sequence of parsers fails, then the sequence as a whole fails. For example: > parse p "abcdef" [((’a’, ’c’), "def")] > parse p "ab" [] ● The do notation is not specific to the Parser type, but can be used with any monadic type. 2

The “Monadic” Way Shell CSCE 314 TAMU Parser sequencing operator (>>=) : : Parser

The “Monadic” Way Shell CSCE 314 TAMU Parser sequencing operator (>>=) : : Parser a -> (a -> Parser b) -> Parser b p >>= f = inp -> case parse p inp of [] -> [] [(v, out)] -> parse (f v) out p >>= f fails if p fails ● otherwise applies f to the result of p ● this results in a new parser, which is then applied Example ● > parse ((failure +++ item) >>= (_ -> item)) "abc" [('b', "c")] 2

Shell CSCE 314 TAMU Sequencing Typical parser structure p 1 >>= p 2 >>=.

Shell CSCE 314 TAMU Sequencing Typical parser structure p 1 >>= p 2 >>=. . . pn >>= return v 1 -> v 2 -> vn -> (f v 1 v 2. . . vn) Using do notation do v 1 <- p 1 v 2 <- p 2. . . vn <- pn return (f v 1 v 2. . . vn) If some vi is not needed, vi <- pi can be written as pi, which corresponds to pi >>= _ ->. . . 2

Shell CSCE 314 TAMU Example Typical parser structure rev 3 = item >>= v

Shell CSCE 314 TAMU Example Typical parser structure rev 3 = item >>= v 1 -> item >>= v 2 -> item >>= _ -> item >>= v 3 -> return $ reverse (v 1: v 2: v 3: []) > rev 3 “abcdef” [(“dba”, ”ef”)] Using do notation rev 3 = do v 1 <- item v 2 <- item v 3 <- item return $ reverse (v 1: v 2: v 3: []) > (rev 3 >>= (_ -> item)) “abcde” [(‘e’, ””)] > (rev 3 >>= (_ -> item)) “abcd” [] 2

Shell CSCE 314 TAMU Key benefit: The result of first parse is available for

Shell CSCE 314 TAMU Key benefit: The result of first parse is available for the subsequent parsers parse (item >>= (x -> item >>= (y -> return (y: [x])))) “ab” [(“ba”, ””)] 2

Shell CSCE 314 TAMU Derived Primitives Parsing a character that satisfies a predicate: sat

Shell CSCE 314 TAMU Derived Primitives Parsing a character that satisfies a predicate: sat : : (Char -> Bool) -> Parser Char sat p = do x <- item if p x then return x else failure Examples > parse (sat [(‘a’, ”bc”)] > parse (sat [] (==‘a’)) “abc” (==‘b’)) “abc” is. Lower) “abc” is. Upper) “abc” 2

Shell CSCE 314 TAMU Derived Parsers from sat digit, letter, alphanum : : Parser

Shell CSCE 314 TAMU Derived Parsers from sat digit, letter, alphanum : : Parser Char digit = sat is. Digit letter = sat is. Alpha alphanum = sat is. Alpha. Num lower, upper : : Parser Char lower = sat is. Lower upper = sat is. Upper char : : Char → Parser Char char x = sat (== x) 2

To accept a particular string Shell CSCE 314 TAMU Use sequencing recursively: string :

To accept a particular string Shell CSCE 314 TAMU Use sequencing recursively: string : : String -> Parser String string [] = return [] string (x: xs) = do char x string xs return (x: xs) Entire parse fails if any of the recursive calls fail > parse (string "if [") "if (a<b) return; " [] > parse (string "if (") "if (a<b) return; " [("if (", "a<b) return; ")] 2

Shell CSCE 314 TAMU many applies the same parser many times Examples many :

Shell CSCE 314 TAMU many applies the same parser many times Examples many : : Parser a -> Parser [a] many p = many 1 p +++ return [] many 1 : : Parser a -> Parser [a] many 1 p = do v <- p vs <- many p return (v: vs) > parse (many digit) "123 ab" [("123", "ab")] > parse (many digit) "ab 123 ab" [("", "ab 123 ab")] > parse (many alphanum) "ab 123 ab" [("ab 123 ab", "")] 2

Example Shell CSCE 314 TAMU We can now define a parser that consumes a

Example Shell CSCE 314 TAMU We can now define a parser that consumes a list of one or more digits of correct format from a string: p : : Parser String p = do char '[' d ← digit ds ← many (do char ', ' digit) char ']' return (d: ds) > parse p "[1, 2, 3, 4]" [("1234", "")] > parse p "[1, 2, 3, 4" [] Note: More sophisticated parsing libraries can indicate and/or recover from errors in the input string. 2

Shell CSCE 314 TAMU Arithmetic Expressions Consider a simple form of expressions built up

Shell CSCE 314 TAMU Arithmetic Expressions Consider a simple form of expressions built up from single digits using the operations of addition + and multiplication *, together with parentheses. We also assume that: * and + associate to the right. * has higher priority than +. 3

Example: Parsing a token Shell CSCE 314 TAMU space : : Parser () space

Example: Parsing a token Shell CSCE 314 TAMU space : : Parser () space = many (sat is. Space) >> return () token : : Parser a -> Parser a token p = space >> p >>= v -> space >> return v identifier : : Parser String identifier = token ident : : Parser String ident = sat is. Lower >>= x -> many (sat is. Alpha. Num) >>= xs -> return (x: xs) 3

Shell CSCE 314 TAMU Formally, the syntax of such expressions is defined by the

Shell CSCE 314 TAMU Formally, the syntax of such expressions is defined by the following context free grammar: expr → term '+' expr | term → factor '*' term | factor → digit | '(' expr ')' digit → '0' | '1' | … | '9' 3

Shell CSCE 314 TAMU However, for reasons of efficiency, it is important to factorize

Shell CSCE 314 TAMU However, for reasons of efficiency, it is important to factorize the rules for expr and term: expr → term ('+' expr | ε) term → factor ('*' term | ε) Note: The symbol ε denotes the empty string. 3

Shell CSCE 314 TAMU It is now easy to translate the grammar into a

Shell CSCE 314 TAMU It is now easy to translate the grammar into a parser that evaluates expressions, by simply rewriting the grammar rules using the parsing primitives. That is, we have: expr : : Parser Int expr = do t ← term do char '+' e ← expr return (t + e) +++ return t expr → term ('+' expr | ε) term → factor ('*' term | ε) 3

Shell CSCE 314 TAMU term : : Parser Int term = do f ←

Shell CSCE 314 TAMU term : : Parser Int term = do f ← factor do char '*' t ← term return (f * t) +++ return f expr → term ('+' expr | ε) term → factor ('*' term | ε) factor : : Parser Int factor = do d ← digit return (digit. To. Int d) +++ do char '(' e ← expr char ')' return e 3

Shell CSCE 314 TAMU Finally, if we define eval : : String → Int

Shell CSCE 314 TAMU Finally, if we define eval : : String → Int eval xs = fst (head (parse expr xs)) then we try out some examples: > eval "2*3+4" 10 > eval "2*(3+4)" 14 > eval "2+5 -" 7 > eval "+5 -" *** Exception: Prelude. head: empty list 3