REGULAR EXPRESSIONS IN PERL PART II By Michael
REGULAR EXPRESSIONS IN PERL – PART II By Michael Wolfe
Grouping Things and Hierarchical Matching In a regexp ab|ac is nice, but it’s not very efficient because it uses “a” twice Perl allows grouping with parenthesis to choice between two different choices Example – a(b|c) means it will match with either ab or ac without us typing each match case
Backtracking The idea of trying one alternative and seeing if it matches, then moving on to the next one if it doesn’t Comes from the idea of walking through the woods with multiple paths Perl goes through all the options until it declares the string false. Example - $string =~ /(abd|abc) (df|d|de)/
Extracting Matches Long - # extract hours, minutes, seconds if ($time =~ /(dd): (dd)/) { # match hh: mm: ss format $hours = $1; $minutes = $2; $seconds = $3; } Compact ($hours, $minutes, $second) = ($time =~ /(dd): (dd)/);
Matching Repetitions The ability to match tedious expressions like ww|www|w. Using quantifier metacharacters ? , *, +, and {} to a? = match 'a' 1 or 0 times a* = match 'a' 0 or more times, i. e. , any number of times a+ = match 'a' 1 or more times, i. e. , at least once a{n, m} = match at least n times, but not more than m times. a{n, } = match at least n or more times a{n} = match exactly n times Quantifiers that grab as much of the string as possible are called maximal match or greedy quantifiers.
Repetition Examples /y(es)? /i; # matches 'y', 'Y', or a case-insensitive 'yes‘ $year =~ /d{2, 4}/; # make sure year is at least 2 but not more than 4 digits
The Reg. Exp principles Principle 0: Taken as a whole, any regexp will be matched at the earliest possible position in the string. Principle 1: In an alternation a|b|c. . . , the leftmost alternative that allows a match for the whole regexp will be the one used. Principle 2: The maximal matching quantifiers ? , *, + and {n, m} will in general match as much of the string as possible while still allowing the whole regexp to match. Principle 3: If there are two or more elements in a regexp, the leftmost greedy quantifier, if any, will match as much of the string as possible while still allowing the whole regexp to match. The next leftmost greedy quantifier, if any, will try to match as much of the string remaining available to it as possible, while still allowing the whole regexp to match. And so on, until all the regexp elements are satisfied.
Resources Perl Tutorial on Course website http: //www. cs. drexel. edu/~knowak/cs 265_fal l_2010/perlretut_2007. pdf Lots more examples on pages 8 -17 involving regular expressions
- Slides: 8