Parsing XML Grammars PDAs Lexical Analysis Recursive Descent

  • Slides: 12
Download presentation
Parsing XML Grammars, PDAs, Lexical Analysis, Recursive Descent

Parsing XML Grammars, PDAs, Lexical Analysis, Recursive Descent

Recipe Book Markup Language • Why Markup languages? – Give structure of contents –

Recipe Book Markup Language • Why Markup languages? – Give structure of contents – aid in interpreting semantics of content, storing in database, etc. • Why XML? – Human readable (sort of) – Widely accepted and used for data interchange • Why RBML? – Don’t reinvent the wheel – use existing stuff IAAP – Simplest of the recipe XML formats I found

Formal Languages • What is a Formal Language? – Mathematically defined subset of strings

Formal Languages • What is a Formal Language? – Mathematically defined subset of strings over a finite alphabet • Regular Languages – Very simple, can be recognized by FSM – Still very powerful • Context-Free Languages – Pretty simple, can be recognized by PDA – Esp. useful for programming language

Regular Expressions/Languages • Alphabet, Σ = finite set of symbols • String, σ =

Regular Expressions/Languages • Alphabet, Σ = finite set of symbols • String, σ = sequence of 0 or more symbols in Σ* • Regular Expressions – The empty set, Ø – The empty string, ε is an RE and denotes {ε} – For all a in Σ, a is an RE and denotes {a} – If r and s are REs, denoting the languages R and S, resp. , then (r+s), (rs), and (r*) are REs that denote R U S, RS, and R*, resp.

Context-Free Languages • Context-Free Grammar G=<V, T, P, S> – V = variables –

Context-Free Languages • Context-Free Grammar G=<V, T, P, S> – V = variables – T = terminals (alphabet characters) – P = Productions – S = start symbol in V • Productions – Replace a variable with a string from (V U T)* – Example: E -> E + E | E * E | (E) | id

RBML Grammar cookbook -> “<cookbook>” title (section | recipe)+ “</cookbook>” title -> “<title>” pcdata

RBML Grammar cookbook -> “<cookbook>” title (section | recipe)+ “</cookbook>” title -> “<title>” pcdata “</title>” section -> “<section>” title recipe+ “</section>” recipe -> “<recipe>” title recipeinfo ingredientlist preparation serving notes “</recipe>”

RBML Grammar recipeinfo -> <recipeinfo> (author | blurb | effort | genre | preptime

RBML Grammar recipeinfo -> <recipeinfo> (author | blurb | effort | genre | preptime | source | yield)* </recipeinfo> ingredientlist -> <ingredientlist> ingredient)* </ingredientlist> preparation -> <preparation> (pcdata | equipment | step | hyperlink)* </preparation> serving -> <serving> (pcdata | hyperlink)* </serving> notes -> <notes> (pcdata | hyperlink)* </notes>

RBML Grammar equipment -> <equipment> (pcdata | hyperlink)* </equipment> step -> <step> (pcdata |

RBML Grammar equipment -> <equipment> (pcdata | hyperlink)* </equipment> step -> <step> (pcdata | equipment | hyperlink)* </step> ingredient -> <ingredient> (pcdata | quantity | unit | fooditem)* </ingredient> quantity -> <quantity> number | number "or" number | number "and" number </quantity> number -> integer | fraction | integer " " fraction -> integer "/" integer

Recipe Book Markup Language unit -> <unit> pcdata </unit> fooditem -> <fooditem> pcdata </fooditem>

Recipe Book Markup Language unit -> <unit> pcdata </unit> fooditem -> <fooditem> pcdata </fooditem> blurb -> <blurb> pcdata </blurb> effort -> <effort> pcdata </effort> genre -> <genre> pcdata </genre>

Recipe Book Markup Language preptime -> <preptime> pcdata </preptime> source -> <source> (pcdata |

Recipe Book Markup Language preptime -> <preptime> pcdata </preptime> source -> <source> (pcdata | hyperlink)* </source> yield -> <yield> pcdata </yield> hyperlink -> pcdata url

Recursive Descent Parsing • Match required (literal) symbols • Call procedure to match variable

Recursive Descent Parsing • Match required (literal) symbols • Call procedure to match variable – May itself call similar procedures

Lexical Analysis • Helps prepare for parsing • Uses regular language expressions to –

Lexical Analysis • Helps prepare for parsing • Uses regular language expressions to – Organize input into multi-symbol chunks – Each chunk has a meaning for parser