Regular Expression Pattern Matching Web Programming 1 Introduction
Regular Expression: Pattern Matching Web Programming 1
Introduction n What is Regular Expression? ► Grammar /rule for matching pattern in strings • pattern = a sequence of characters n Pattern matching ► Syntax • $target =~ /$pattern/; à à à searches $target for $pattern true if found, false otherwise – ( “prog 1. pl” =~ /pl/) : true – ( “prog 1. pl” =~ /PL/) : false can use any pattern delimiter character with m – e. g. $target =~ m!$pattern! (or m|$pattern|) • $target !~ /$pattern/; à true if NOT found, false otherwise Example script Web Programming 2
Matching Repeats n + Match one or more preceding characters • e. g. /ab+c/ matches ‘abc’, ‘abbbc’, etc. n * Match zero or more preceding characters • e. g. /ab*c/ matches ‘ac’, ‘abbc’, ‘abbbc’, etc. n ? Match zero or one preceding character • e. g. /ab? c/ matches ‘ac’, or ‘abc’. n {n, m} Match n to m preceding character • e. g. /ab{1, 3}c/ matches ‘abc’, ‘abbc’, or ‘abbbc’. /ab{3}c/ matches ‘abbbc’. /ab{2, }c/ matches ‘abbc’, ‘abbbbc’, etc Regular Expression Checker: Perl script, HTML Form, CGI Web Programming 3
Special Characters n [$pattern] Match any character in $pattern • e. g. /[Rr]ed/ matches ‘bredd’, ‘b. Redd’, ‘red’, ‘Red’, etc. /[0 -9]/ matches any number, /[a-z. A-Z]/ matches any alphabet /[0 -9 a-z. A-z]/ matches any alphanumeric n [^$pattern] Match any character except those in $pattern • e. g. /[^0 -9]/ matches any non-numeric characters. n . Match any character • e. g. /b. a/ matches ‘abba’, ‘b 1 a’, etc. /b. *a/ matches ‘aba’, ‘abba’, ‘banana’, etc. • ‘. ’ does not match newline (i. e. n) à /b. *a/ does not match ‘abna’, ‘abbna’, etc. n Escape for special characters (e. g. *, ? , +, ^, $) • e. g. /ab*c/ matches ‘ab*c’ n | /$pattern 1|$pattern 2/: match $pattern 1 or $pattern 2 • e. g. /abc|def/ match ‘abcd’, ‘cdef’, etc. Example: HTML Form, CGI Web Programming 4
Anchoring Patterns n ^ Match at the beginning of target string • e. g. /red/ matches ‘red’, ‘redd’, ‘bredd’, etc. /^red/ matches ‘red’, ‘redd’ n $ Match at the end of target string • e. g. /red$/ matches ‘red’, ‘bred’ /^red$/ matches ‘red’ n b Match at word boundary ► “word” consists of alphanumeric and underscore characters (i. e. [0 -9 a-z. A-Z_]). • e. g. /bredb/ matches ‘is red’, ‘red rose’, ‘$red’ n B Match inside word • e. g. /BredB/ matches ‘bredd’ S 517 Session 7, IU-SLIS 5
Escape Sequences n d ► n D ► n any non-word character (i. e. [^_0 -9 a-z. A-Z]) s ► n any word character (i. e. [_0 -9 a-z. A-Z]) W ► n any non-digit (i. e. [^0 -9]) w ► n any digit (i. e. [0 -9] ) any white space (i. e. [ rtnf]) S ► any non-white space (i. e. [^ rtnf]) Web Programming 6
Pattern Matching Options n /$pattern/i ► Ignore case • e. g. /ab/i matches ‘ab’, ‘AB’, ‘Ab’, ‘a. B’ n /$pattern/x ► ► ignore white space use to match blank • e. g. /a b/x is same as /ab/ /a b/x is same as /a b/ n /$pattern/g ► ► Matches all possible patterns Returns a list of matches • e. g. @matches = ‘abcdcb’=~/. b/g; @matches will be (‘ab’, ’cb’) Example: HTML Form, CGI Web Programming 7
Reusing Patterns n ($pattern) ► $pattern enclosed in parenthesis is stored in memory • within the same regular expression (i. e. / /), the stored pattern can be recalled by N • following the regular expression, the stored pattern can be recalled by $N à N refers to Nth pattern stored in memory Example 1: date validation $date = “ 12/31/2002”; $date =~ /d{2}(W)d{2}1d{2}/; Example 2: homework name validation $filename = “hw 5 -3. pl”; $filename =~ /^hw(d{1, 2})-(d[a-z]? ). pl$/; $hwnum = $1; $pbnum = $2; S 517 Session 7, IU-SLIS 8
Reusing Patterns: Special variables n $& ► n $` ► n $’ ► Contains the entire matched pattern Contains the string before the pattern Contains the string after the pattern Example $string = “yabado”; $string =~ /ba/; $& eq “ba” $` eq “ya” $’ eq “do” S 517 Session 7, IU-SLIS 9
Substitution n $string =~ s/$pattern/$replacement/; ► Replace $pattern with $replacement in $string = “Before substitution”; $string =~ s/Before/After/; “After substitution” n Substitution Options ► $string =~ s/$pattern/$replacement/i; • Ignore case of $pattern $string = “One plus one is done. ”; $string =~ s/one/ONE/i; “ONE plus one is done. ” ► $string =~ s/$pattern/$replacement/g; • Change all occurrence of $pattern in $string = “ this is line 1. ”; $string =~ s/ +/ /g; “ this is a line. ” ► $string =~ s/$pattern/$replacement/s; • Treat $string as a single line (i. e. , . will match n) $string = “this <img src=n img. gif>image”; $string =~ s/<. +>//s; “this image” Example Web Programming 10
Substitution option: g n $string =~ s/$pattern/$replacement/g; ► Change all occurrence of $pattern in $string ► Example $string = “ this is a line. ”; $string =~ s/ +/ /; $string eq “ this is a line. ” $string = “ this is a line. ”; $string =~ s/ +/ /g; $string eq “ this is a line. ” $string =~ s/^ +//; # delete leading blanks $string =~ s/ +$//; # delete trailing blanks $string eq “this is a line. ” Example script Sample script to change selected number of patterns S 517 Session 8, IU-SLIS 11
Substitution option: i, x n $string =~ s/$pattern/$replacement/i; ► ► Ignore case of $pattern Example $string = “One plus one is done. ”; $string =~ s/one/ONE/g; $string eq “One plus ONE is d. ONE. ” $string = “One plus one is done. ”; $string =~ s/one/ONE/gi; $string eq “ONE plus ONE is d. ONE. ” n $string =~ s/$pattern/$replacement/x; ► ► Ignore white space in $pattern Example $string =~ s/b one b/ONE/gix; $string eq “ONE plus ONE is done. ” Example script S 517 Session 8, IU-SLIS 12
Substitution option: e n $string =~ s/$pattern/$replacement/e; ► Evaluate $replacement, then replace ► Example $string = “One plus one is done. ”; $string =~ s/(plus one )/$1 x 2/; $string eq “One plus one x 2 is done. ” $string = “One plus one is done. ”; $string =~ s/(plus one )/$1 x 2/e; $string eq “ONE plus one is done. ” Example script S 517 Session 8, IU-SLIS 13
Substitution option: s, m n $string =~ s/$pattern/$replacement/s; ► Treat $string as a single line • . will match n ► Example $string = “this <img src=n img. gif>image”; $string =~ s/<. +>//s; $string eq “this image” n $string =~ s/$pattern/$replacement/m; ► Treat $string as multiple lines • ^ and $ will match for each line ► Example $string = “ thisn image ”; $string =~ s/^ +(. +)$/$1/mg; $string =~ s/^(. +) +$/$1/mg; $string eq “thisnimage” Example script S 517 Session 8, IU-SLIS 14
Translation n $string =~ tr/$pattern/$replacement/; ► ► replace each character of $pattern with each character of $replacement in entire $string pattern special characters (e. g. ^, $) are not supported. Example $string = “abba”; $string =~ tr/ab/AB/; $string eq “ABBA” n Case Conversion ► uppercase to lowercase • $line =~ tr/A-Z/a-z/; ► lowercase to uppercase • $line =~ tr/a-z/A-Z/; S 517 Session 8, IU-SLIS 15
Translation options n $string =~ tr/$pattern/$replacement/c; ► translate all characters not specified $string =~ tr/a-z. A-Z/ /c; changes all non-letter characters to blanks n $string =~ tr/$pattern/$replacement/s; ► compress identical characters in the translated string to a single character $string =~ tr/a-z. A-Z/ /cs; $string will have only alphabetical words separated by a blank n $string =~ tr/$pattern/$replacement/d; ► delete all specified characters $string =~ tr/a-z. A-Z//d; deletes all letter characters Example script S 517 Session 8, IU-SLIS 16
Non-greedy Multipliers n * and + are “greedy” multipliers by default ► matches as many characters as possible $line = “You ordered <b>tea</b>. ”; $line =~ s/<. +>//g; $line eq “You ordered. ” n put ? after multiplier to make it non-greedy ► matches the first possible pattern $line =~ s/<. +? >//g; $line eq “You ordered tea. ” Example script S 517 Session 8, IU-SLIS 17
Review: Reusing Patterns n /($pattern)/ ► ► ► character sequence matched by $pattern is stored in memory within the same regular expression (i. e. / /), the stored pattern can be recalled by N following the regular expression, the stored pattern can be recalled by $N • $N is destroyed when another pattern match is performed ► Example $string = “abcabc”; $string =~ s/(abc)1/$1/; $string eq “abc” Example script n Special Variables ► ► ► $& $` $’ • contains the entire matched pattern • contains the string before the pattern • contains the string after the pattern S 517 Session 8, IU-SLIS 18
- Slides: 18