REGULAR EXPRESSION IN PERL PART 1 Thach Nguyen

OBJECTIVE What is Regular Expression? How to use Regular Expression in Perl � Basic

WHAT IS REGULAR EXPRESSION (REGEX, REGEXP)? Big factor behind the fame of Perl A

HOW TO USE REGEX Part 1: basics (solve about 98% of your needs) Simple

PART 1: THE BASICS Simple word matching The simplest regex: a word, a string

PART 1: THE BASICS Simple word matching Operator =~ : return true if the

PART 1: THE BASICS Simple word matching – Additional Can If use the default

PART 1: THE BASICS Simple word matching – Special characters metacharacters: Use backslash

PART 1: THE BASICS Simple word matching – Special characters Anchor metacharacters: ^ and

PART 1: THE BASICS Using A character classes: set of possible characters To match

PART 1: THE BASICS Using character classes – Special characters: -]^$ Needed a backslash

PART 1: THE BASICS Using character classes – Special characters: Several d abbreviations for

PART 1: THE BASICS Issues: why '. ' matches everything but "n“? We would

PART 1: THE BASICS Matching this or that: Able to match different possible words

Slides: 15

Download presentation

REGULAR EXPRESSION IN PERL (PART 1) Thach Nguyen

OBJECTIVE What is Regular Expression? How to use Regular Expression in Perl � Basic tools Simple word matching Using character classes Matching this or that … � Power tools

WHAT IS REGULAR EXPRESSION (REGEX, REGEXP)? Big factor behind the fame of Perl A string that describe a pattern Examples of pattern: Search engine to find webpage (Google) List files in directory (ls *. txt, dir *. *) Search, extract parts of strings, search and replace (Microsoft Word) Efficient, flexible to manipulate text Not really difficult to understand as reputation Constructed using simple concepts (conditional, loop) If getting used to terse notation of them, you’re good to go

HOW TO USE REGEX Part 1: basics (solve about 98% of your needs) Simple word matching Using character classes Matching this or that Part 2: power tools (for the rest) Advanced regex operators Latest innovation

PART 1: THE BASICS Simple word matching The simplest regex: a word, a string of characters Match any string that contains that word Eg: Result: It matches

PART 1: THE BASICS Simple word matching Operator =~ : return true if the regex matched !~ : return true if doesn’t match / … / : delimiter to enclose the string/variable of string needed to search Eg: $greeting = “World”; if (“Hello World” =~ /$greeting/) { … } Other arbitrary delimiters:

PART 1: THE BASICS Simple word matching – Additional Can If use the default variable $_ , the omit “$_ =~ ” part Eg: $_ = “Hello World”; If (/World/) { … } regex matches in > 1 place: the earliest point is matched Eg: "Hello World" =~ /o/; # matches 'o' in 'Hello‘

PART 1: THE BASICS Simple word matching – Special characters metacharacters: Use backslash to include Escape Sequences ASCII characters (n, t. etc), arbitrary bytes (octal, hexa ) Variables: {}[]()^$. |*+? substituted before matching Eg: $foo = ‘house’; 'cathouse' =~ /cat$foo/; # matches

PART 1: THE BASICS Simple word matching – Special characters Anchor metacharacters: ^ and $ , to match the beginning and the end of string Overall: it’s just the surface of regex technology

PART 1: THE BASICS Using A character classes: set of possible characters To match the whole class at particular point in the regex Denoted by brackets [ … ] Eg: /item[0123456789]/; # matches 'item 0' or. . . or 'item 9' "abc" =~ /[cab]/; # matches 'a‘ To match 'yes' in a case-insensitive way (yes, YES): /[y. Y][e. E][s. S]/ /yes/i (i : case-insensitive, modifier of matching operation)

PART 1: THE BASICS Using character classes – Special characters: -]^$ Needed a backslash to represent ] The end of a character class $ Scalar variable Escape sequences - Range operator within character class ^ Negated character class

PART 1: THE BASICS Using character classes – Special characters: Several d abbreviations for common character classes a digit and represents [0 -9] s whitespace character, represents [ trnf] D negated d S negated s W negated w . any character but "n" b matches a boundary between a word character and a non-word character wW or Ww

PART 1: THE BASICS Issues: why '. ' matches everything but "n“? We would like to ignore the newline characters, empty when counting and matching on the line If we want to keep track of newlines: anchor ^ $, modifiers /…/s (single line) and /…/m (multiple line) No modifier // ‘. ’ match any character except ‘n’ ^, $: just match the beginning and end of string, before a newline S modifier //s Treat string as a single long line ‘. ’ match any character, ^ and $ just match the beginning and end of string before a newline M modifier //m Treat string as a set of multiple lines ‘. ’ match any character except ‘n’ ^ and $ match at the start or end of any line in string Both //sm Treat string as a single line, but detect multiple lines ‘. ’ match any character ^ and $ match the start and end of any line within the string

PART 1: THE BASICS Matching this or that: Able to match different possible words or strings Using alternation metacharacter | Eg: "cats and dogs" =~ /dog|cat|bird/; "cats" =~ /cats|cat|ca|c/; # matches "cat“ # matches "cats"

QUESTION