LANGUAGE AND GRAMMARS COMP 319 University of Liverpool

  • Slides: 32
Download presentation
LANGUAGE AND GRAMMARS COMP 319 © University of Liverpool slide 1

LANGUAGE AND GRAMMARS COMP 319 © University of Liverpool slide 1

Contents • • • Languages and Grammars Formal languages Formal grammars Generative grammars Analytic

Contents • • • Languages and Grammars Formal languages Formal grammars Generative grammars Analytic grammars Context-free grammars LL parsers LR parsers Rewrite systems L-systems COMP 319 © University of Liverpool slide 2

Software Engineering Foundation Software engineering may be summarised by saying that it concerns the

Software Engineering Foundation Software engineering may be summarised by saying that it concerns the construction of programs to solve problems and that there are three parts: - Construction/engineering, and methods - Problems, and problem solving, and - Programs COMP 319 © University of Liverpool slide 3

Languages and grammar • Languages are spoken and written (linguistics) • To be effective

Languages and grammar • Languages are spoken and written (linguistics) • To be effective they must be based on a shared set of rules – a grammar • Grammars are introspective they are based on and couched in language • Natural language grammars are constantly shifting and locally negotiated • A grammar is a formal language in which the rules of discourse are discussed and are the aim COMP 319 © University of Liverpool slide 4

Formal language concepts • The concept emerges because of the need to define rules

Formal language concepts • The concept emerges because of the need to define rules (for language) • Formally, they are collections of words composed of smaller, atomic units • Issues of concern are - the number and nature of the atomic units, - the precision level required, - the completeness of the formalism COMP 319 © University of Liverpool slide 5

Examples of formal languages • The set of all words over {a, b} •

Examples of formal languages • The set of all words over {a, b} • The set {an : n is a prime number} • The set of syntactically correct programs in a given computer programming language • The set of inputs upon which a certain Turing machine halts COMP 319 © University of Liverpool slide 6

Formal language specification There are many ways in which a formal language can be

Formal language specification There are many ways in which a formal language can be specified e. g. • strings produced in a formal grammar • strings produced by regular expressions • the strings accepted by automata • logic and other formalisms COMP 319 © University of Liverpool slide 7

Language Production Operations • Concatenation of strings drawn from the two languages • Intersection

Language Production Operations • Concatenation of strings drawn from the two languages • Intersection or union of common strings in both languages • Complement of one language • Right quotient of one by the other • Kleene star operation on one language • Reverse of a language • Shuffle combination of languages COMP 319 © University of Liverpool slide 8

Formal Grammars • Noam Chomsky - Linguist, philosopher at MIT - 1956, papers on

Formal Grammars • Noam Chomsky - Linguist, philosopher at MIT - 1956, papers on information and grammar • Types of formal grammar - Generative grammar - Analytical grammar COMP 319 © University of Liverpool slide 9

Generative formal grammars • Generative grammars: A set of rules by which all possible

Generative formal grammars • Generative grammars: A set of rules by which all possible strings in a language to be described can be generated by successively rewriting strings starting from a designated start symbol. In effect it formalises an algorithm that generates strings in the language. COMP 319 © University of Liverpool slide 10

Analytic formal grammars • Analytic grammars: A set of rules that assumes an arbitrary

Analytic formal grammars • Analytic grammars: A set of rules that assumes an arbitrary string as input, and which successively reduces or analyses that string to yield a final boolean “yes/no” that indicates whether that string is a member of the language described by the grammar In effect a parser or recogniser for a language COMP 319 © University of Liverpool slide 11

Generative grammar components Chomsky’s definition – essentially for linguistics but perfect formal computing grammars;

Generative grammar components Chomsky’s definition – essentially for linguistics but perfect formal computing grammars; consists of the following components: - A finite set N of nonterminal symbols - A finite set of terminal symbols disjoint from N - A finite set P of production rules where a rule is of the form: string in ( N)* → string in ( N)* - A symbol S in N that is identified as the start symbol COMP 319 © University of Liverpool slide 12

Generative grammar definition • A language of a formal grammar: • G = (N,

Generative grammar definition • A language of a formal grammar: • G = (N, , P, S) • Is denoted by L(G) • And is defined as all those strings over � such that can be generated by starting from the symbol S and then applying P until no more nonterminal symbols are present COMP 319 © University of Liverpool slide 13

A generative formal grammar • Given the terminals {a, b}, nonterminals {S, A, B}

A generative formal grammar • Given the terminals {a, b}, nonterminals {S, A, B} where S is the special start symbol and • Productions: S → ABS S → (the empty string) BA → AB BS → b Bb → bb Ab → ab Aa → aa Defines all the words of the from anbn, (i. e. n copies of a followed by n copies of b) COMP 319 © University of Liverpool slide 14

Context Free Grammars • Theoretical basis of most programming languages. • Easy to generate

Context Free Grammars • Theoretical basis of most programming languages. • Easy to generate a parser using a compiler. • Two main approaches exist: top-down parsing e. g. LL parsers, and bottom-up parsing e. g. LR parsers. COMP 319 © University of Liverpool slide 15

LL parser • Table based, top down parser for a subset of the context-free

LL parser • Table based, top down parser for a subset of the context-free grammars (LL grammars). • Parsing is Left to right, and constructs a Leftmost derivation of the sentence. • LL(k) parsers use k tokens of look-ahead to parse the LL(k) grammar sentence. • LL(1) grammars are popular and fast because only the next token is considered in parsing decisions. COMP 319 © University of Liverpool slide 16

Table based LL parsing • Consider the grammar 1. S → F 2. S

Table based LL parsing • Consider the grammar 1. S → F 2. S → ( S + F) 3. F → 1 • This has the parsing table ( ) 1 + $ S 2 - 1 - - F - - 3 - - e. g. 1 and S implies rule 1 i. e. Stack S is replaced with F and 1 is output Stack and Input same = delete Stack and Input different = error Architecture Input buffer: <null> | | +-------+ Stack | | S <---| Parser | --> Output $ | | +-------+ ^ | +------+ | Parsing | | table | +------+ • Example input (1+1)$ COMP 319 © University of Liverpool slide 17

Table based LL parsing • Consider the grammar 1. S → F 2. S

Table based LL parsing • Consider the grammar 1. S → F 2. S → ( S + F) 3. F → 1 • This has the parsing table input stack action output ( S$ parse ( S : 2 2 ( (S + F)$ ( ( delete 2 1 S + F)$ parse 1 S : 1 21 ( ) 1 + $ 1 F + F)$ parse 1 F : 3 213 S 2 - 1 - - 1 1 + F)$ 1 1 delete 213 F - - 3 - - + + F)$ + + delete 213 1 F)$ parse 1 F : 3 2133 1 1)$ 1 1 delete 2133 ) )$ ) ) delete 2133 $ $ stop 2133 e. g. 1 and S implies rule 1 i. e. Stack S is replaced with F and 1 is output Stack and Input same = delete Stack and Input different = error • Example input (1+1)$ COMP 319 © University of Liverpool slide 18

Parse Tree

Parse Tree

Left Right Parser • Bottom up parser for context-free grammars used by many program

Left Right Parser • Bottom up parser for context-free grammars used by many program language compilers • Parsing is Left to right, and produces a Rightmost derivation. • LR(k) parsers uses k tokens of look-ahead. • LR(1) is the most common type of parser used by many programming languages. Usually always generated using a parser generator which constructs the parsing table; e. g. Simple LR parser (SLR), Look Ahead LR (LALR) e. g. Yacc, Canonical LR. COMP 319 © University of Liverpool slide 20

Left Right parser example. . • • • Rules. . . 1) E →

Left Right parser example. . • • • Rules. . . 1) E → E * B (2) E → E + B (3) E → B (4) B → 0 (5) B → 1 COMP 319 © University of Liverpool slide 21

Left Right parser example COMP 319 © University of Liverpool slide 22

Left Right parser example COMP 319 © University of Liverpool slide 22

Re-writing • Rewriting is a general process involving strings and alphabets. Classified according to

Re-writing • Rewriting is a general process involving strings and alphabets. Classified according to what is rewritten e. g. strings, terms, graphs, etc. • A rewrite system is a set of equations that characterises a system of computation that provides one method of automating theorem proving and is based on use of rewrite rules. • Examples of practical systems that use this approach includes the software Mathematica. COMP 319 © University of Liverpool slide 23

Re-writing logic example • !!A=A // eliminate double negative • !(A AND B) =

Re-writing logic example • !!A=A // eliminate double negative • !(A AND B) = !A OR !B // de-morgan COMP 319 © University of Liverpool slide 24

Re-writing in Mathematica (Wolfram) COMP 319 © University of Liverpool slide 25

Re-writing in Mathematica (Wolfram) COMP 319 © University of Liverpool slide 25

L-systems • Named after Aristid Lindenmeyer (1925 -1989) a Swedish theoretical biologist and botanist

L-systems • Named after Aristid Lindenmeyer (1925 -1989) a Swedish theoretical biologist and botanist who worked at the University of Utrecht (Netherlands) • Are a formal grammar used to model the growth and morphology of plants and animals • In plant and animal modelling a special form, the parametric L-system is used – based on rewriting. • Because of their recursive, parallel, and unlimited nature they lead to concepts of selfsimilarity and fractional dimension and fractallike forms. COMP 319 © University of Liverpool slide 26

L-system structure • The basic system is identical to formal grammars: G = {V,

L-system structure • The basic system is identical to formal grammars: G = {V, S, Ω, P} • where G is the grammar defined V (the alphabet) a set of symbols that can be replaced by (variables) S is a set of symbols that remain fixed (constants) Ω(start, axiom or initiator) a string from V, the initial state P is a set of rules or productions defining the ways variables can be replaced by constants and other variables. Each rule, consists of a LHS (predecessor) and RHS (successor) COMP 319 © University of Liverpool slide 27

Slide 28 Example 1: Fibonacci numbers • • V: A B C: none Ω:

Slide 28 Example 1: Fibonacci numbers • • V: A B C: none Ω: A P: p 1: A → B p 2: B → AB COMP 319 N=0 A N=1 → B N=2 → AB N=3 → BAB N=4 → ABBAB N=5 → BABABBAB N=6 → ABBABBAB N=7 → BABABBAB. . . Counting lengths we get: 1, 1, 2, 3, 5, 8, 13, 21, . . . The Fibonacci numbers © University of Liverpool slide 28

Slide 29 Example 2: Algal growth • • V: A B C: none Ω:

Slide 29 Example 2: Algal growth • • V: A B C: none Ω: A P: p 1: A → AB p 2: B → A COMP 319 N=0 N=1 N=2 N=3 A → ABAABABA © University of Liverpool slide 29

Example 3: Koch snowflake • • V: F C: none Ω: F P: p

Example 3: Koch snowflake • • V: F C: none Ω: F P: p 1: F → F+F-FF+F COMP 319 N=0 F N=1 → F+F-F-F+F N=2 → F+F-F-F+F+F. . . N=3 etc © University of Liverpool COMP 319 Software Engineering II slide 30

Example 4: 3 D Hilbert curve COMP 319 © University of Liverpool slide 31

Example 4: 3 D Hilbert curve COMP 319 © University of Liverpool slide 31

Example 5: Branching COMP 319 © University of Liverpool slide 32

Example 5: Branching COMP 319 © University of Liverpool slide 32