Compiler Structures 242 437 Semester 2 2020 2021

  • Slides: 42
Download presentation
Compiler Structures 242 -437 , Semester 2 , 2020 -2021 1. 5 Regular Expressions

Compiler Structures 242 -437 , Semester 2 , 2020 -2021 1. 5 Regular Expressions (REs) • Objectives �what is a regular expression? �give examples of REs used in the grep command 1

1. Regular Expressions �A regular expression (RE or regex) is a pattern used to

1. Regular Expressions �A regular expression (RE or regex) is a pattern used to match against text when searching inside a file. �Regexs are used everywhere in Linux: �Editors: ed, ex, vi �Utilities: grep, egrep, sed, and awk 2

String Regex regex pattern text: c k s UNIX Tools rocks. match text: UNIX

String Regex regex pattern text: c k s UNIX Tools rocks. match text: UNIX Tools sucks. match text: UNIX Tools is okay. no match 3

Multiple Matches �A regex pattern can match text in more than one place. regex

Multiple Matches �A regex pattern can match text in more than one place. regex pattern text: a p p l e Scrapple from the apple. match 1 match 2 4

The. (dot) Regex �The. regex pattern can be used to match any character in

The. (dot) Regex �The. regex pattern can be used to match any character in the text. regex pattern text: o. For me to poop on. match 1 match 2 5

The Character Class Regex �A character class [] can match any set of characters

The Character Class Regex �A character class [] can match any set of characters in the text. regex pattern text: b [eor] a t beat a brat on a boat match 1 match 2 match 3 6

Character Class Examples 7

Character Class Examples 7

Repetition Regex: * (star) • The * defines zero or more copies of the

Repetition Regex: * (star) • The * defines zero or more copies of the letter before it. regex pattern text: y a * y I got mail, yaaaaay! match 8

regex pattern text: o a * o I like the zoo. match regex pattern

regex pattern text: o a * o I like the zoo. match regex pattern text: h. * o Say hello Andrew. match 9

regex pattern text: h. * o Say hello to Andrew. match Regex are greedy

regex pattern text: h. * o Say hello to Andrew. match Regex are greedy – they match as much of the text as they can. 10

Anchors: ^ $ regex pattern text: ^ b [eor] a t beat a brat

Anchors: ^ $ regex pattern text: ^ b [eor] a t beat a brat on a boat ^ matches the beginning of the text line match text: regex pattern b [eor] a t $ beat a brat on a boat match $ matches the end of the text line 11

More Anchors 12

More Anchors 12

The | (or) Regex 13

The | (or) Regex 13

More Repetition Regexs: * + ? 14

More Repetition Regexs: * + ? 14

More Regex Operations �See the regular expressions "cheat-sheet" at the course website �over 80

More Regex Operations �See the regular expressions "cheat-sheet" at the course website �over 80 operators!! 15

2. grep �“grep” uses a regex pattern to search a text file �all the

2. grep �“grep” uses a regex pattern to search a text file �all the lines containing a match (or matches) are regex pattern in ". . . " printed text filename �Examples: % grep "root" test 1 % grep "r. . t" test 1 % grep "ro*t" test 1 % grep "r[a-z]*t" test 1 16

The Grep Family �grep usual version �egrep extended REs | + ? don’t need

The Grep Family �grep usual version �egrep extended REs | + ? don’t need backslash) �fgrep only strings, i. e. is faster 17

Common “grep” Options -c Print a count of matched lines. -i Ignore uppercase/ lowercase

Common “grep” Options -c Print a count of matched lines. -i Ignore uppercase/ lowercase -l List filenames that contain matches -n Print matched lines and line numbers -s Work silently; only display error messages. -v Print lines that do not match the pattern. 18

Some Simple Examples • grep searches input lines, a line at a time. •

Some Simple Examples • grep searches input lines, a line at a time. • If the line contains a string that matches grep's RE (pattern), then the line is output. input lines )e. g. from a file( grep "RE" output matching lines )e. g. to a file( hello andy my name is andy my bye byhe continued 19

Examples grep "and" hello andy my name is andy my bye byhe hello andy

Examples grep "and" hello andy my name is andy my bye byhe hello andy my name is andy grep -E" an|my" hello andy my name is andy my bye byhe "|"means "or" continued 20

grep "hel"* hello andy my name is andy my bye byhe hello andy my

grep "hel"* hello andy my name is andy my bye byhe hello andy my bye byhe "*"means "0 or more" 21

grep with < > begin and end of word Look for the word "north"

grep with < > begin and end of word Look for the word "north" 22

grep with a|b egrep doesn't need backslash a or b 23

grep with a|b egrep doesn't need backslash a or b 23

grep with + egrep doesn't need backslash one or more 24

grep with + egrep doesn't need backslash one or more 24

grep with. egrep doesn't need backslash any character 25

grep with. egrep doesn't need backslash any character 25

grep with ^ and $ begin and end of line 26

grep with ^ and $ begin and end of line 26

grep with [ ] set of chars 27

grep with [ ] set of chars 27

Fun with a Linux Dictionary Find the location of the words file List all

Fun with a Linux Dictionary Find the location of the words file List all the words containing "hh" 28

Look for "niether" or "neither" Look for words with three "u"s Count the words

Look for "niether" or "neither" Look for words with three "u"s Count the words with three "a"s 29

Complex Regex Examples �Variable names in C [a-z. A-Z_][a-z. A-Z_0 -9]* �Dollar amount with

Complex Regex Examples �Variable names in C [a-z. A-Z_][a-z. A-Z_0 -9]* �Dollar amount with optional cents $[0 -9]+(. [0 -9])? �Time of day (1[012]|[1 -9]): [0 -5][0 -9] (am|pm) �HTML headers <h 1> <H 1> <h 2> … <[h. H][1 -4]> 30

3. The RE Language • A RE can be defined as a pattern language

3. The RE Language • A RE can be defined as a pattern language (operands and operators) which matches on text strings. 31

Some Possible RE Operands �text characters (e. g. ‘a’, ‘ 1(‘)‘ , ’ �the

Some Possible RE Operands �text characters (e. g. ‘a’, ‘ 1(‘)‘ , ’ �the symbol e (means an empty string(’‘ �in code just use "" �variables, which can be assigned a RE • variable = RE 32

The Basic RE Operators • There are three basic operators: �union’|‘ �concatenation �closure* 33

The Basic RE Operators • There are three basic operators: �union’|‘ �concatenation �closure* 33

Union • S|T �use S or T to match strings • Example REs: a|b|c

Union • S|T �use S or T to match strings • Example REs: a|b|c 34

Concatenation • ST �use S followed by the T to match against strings •

Concatenation • ST �use S followed by the T to match against strings • Example REs: ab w | (a b) matches the string "ab" matches the strings "w" or "ab" 35

Closure • S* �use S 0 or more times to match against strings •

Closure • S* �use S 0 or more times to match against strings • Example RE: a* matches the strings: e, a, aaa, aaaaa. . . , empty string 36

3. 1. REs for C Identifiers • We define two RE variables, letter and

3. 1. REs for C Identifiers • We define two RE variables, letter and digit: letter = A | B | C | D. . . Z | a | b | c | d. . z digit = 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |9 • id is defined using letter and digit: id = letter ( letter | digit*( continued 37

 • Strings matched by id include: ab 345 w • Strings not matched:

• Strings matched by id include: ab 345 w • Strings not matched: $2 abc h 5 g **** 38

3. 2. REs for Integers and Floats • We redefine digit: digit = 0|1|2|3|4|5|6|7|8|9

3. 2. REs for Integers and Floats • We redefine digit: digit = 0|1|2|3|4|5|6|7|8|9 or digit = [1 – 9] • int and float: int = {digit}+ float = {digit}+ ". " {digit}+ 39

 • Integers and floats with exponents: number = {digit}) +'. ' {digit}+ (?

• Integers and floats with exponents: number = {digit}) +'. ' {digit}+ (? ) 'E')'+'|'-'(? {digit}+ (? 40

4. More on REs v See RE summary on the course website: regular_expressions_cheat_sheet. pdf

4. More on REs v See RE summary on the course website: regular_expressions_cheat_sheet. pdf v I have the standard RE book: � Mastering Regular Expressions Jeffrey E. F. Freidl O'Reilly & Associates continued 41

v. There are many websites that explain REs: http: //etext. lib. virginia. edu/services/ helpsheets/unix/regex.

v. There are many websites that explain REs: http: //etext. lib. virginia. edu/services/ helpsheets/unix/regex. html http: //www. zytrax. com/tech/web/regex. htm http: //www. regular-expressions. info 42