Syntax and Semantics Syntax gives the structure of

Syntax and Lexical Structure • Syntax gives the structure of statements in a language

Scanning vs. Parsing Roles • It is often possible to simplify a grammar’s structure

Regular Expressions, DFAs, NDFAs • Regular expressions capture lexical structure of symbols that can

Today’s Studio Exercises • We’ll code up some ideas from Scott Chapter 2. 1

Slides: 5

Syntax and Semantics • Syntax gives the structure of statements in a language – Allowed ordering, nesting, repetition, omission of symbols – Can automate the process of checking correct syntax • Semantics give meaning to the (structured) symbols – E. g. , kinds of labels, types of variables, layout of classes – For example, what does 11 mean? (at least 3 good answers) • Separating syntactic and semantic evaluation helps – Can isolate the problem of syntactic recognition in an engine – Can use the structure produced by the engine directly – Sometimes called syntax-directed (compiler in charge) CSE 425: Syntax I

Syntax and Lexical Structure • Syntax gives the structure of statements in a language – E. g. , the format of tokens and how they can be arranged – Lexical structure also describes how to recognize them • Scanning obtains tokens from a stream characters – E. g. , whitespace delimited vs. regular-expression based – Tokens include keywords, constants, symbols, identifiers – Usually based on assumption of taking longest substring • Parsing recognizes more complex expressions – E. g. , well-formed statements in logic, arithmetic, etc. – Free-format languages ignore indentation, etc. while fixed format languages have specific restrictions/requirements CSE 425: Syntax I

Scanning vs. Parsing Roles • It is often possible to simplify a grammar’s structure by making its tokens more sophisticated – For example, scanning for the terminal token NUMBER vs. parsing for the non-terminal number → nonzerodigit* • Such simplification delegates work to a scanner – Often this is a good separation of concerns, especially since scanning may appropriately specialize it logic, etc. – E. g. , a fairly general scanner built from classification functions (which look for all digits, all alphabetic, etc. ) can be re-used or refactored easily for scanning different grammars – E. g. , the C++11 <regex> library is worth studying and using CSE 425: Syntax I

Regular Expressions, DFAs, NDFAs • Regular expressions capture lexical structure of symbols that can be built using 3 composition rules – Concatenation (ab) , selection (a | b), repetition (b*) • Finite automata can recognize regular expressions – Deterministic finite automata (DFAs) associate a unique state with each sequence generated by a regular expression – Non-deterministic finite automata (NDFAs) let multiple states to be reached by the same input sequence (adding “choice”) • Can generate a unique (minimal) DFA in 3 steps – Generate NDFA from the regular expression (Scott pp. 56) – Convert NDFA to (possibly larger) DFA (Scott pp. 56 -58) – Minimize the DFA (Scott pp. 59) to get a unique automaton • C++11 <regex> library automates all this for you CSE 425: Syntax I

Today’s Studio Exercises • We’ll code up some ideas from Scott Chapter 2. 1 -2. 2 – Looking at mechanisms for recognizing tokens and for parsing basic CFGs with straightforward recursion – Next studio we’ll look at more complicated variations • Today’s exercises are all in C++ – We’ll write our own code, but check out the <regexp> library too, since you’ll be allowed to use it for lab assignments! – Please take advantage of the on-line tutorial and reference manual pages that are linked on the course web site – As always, please ask us for help as needed • When done, email your answers to the course account with “Syntax Studio I” in the subject line CSE 425: Syntax I