CIT 383 Administrative Scripting Regular Expressions Computer Security

  • Slides: 16
Download presentation
CIT 383: Administrative Scripting Regular Expressions Computer Security: Art and Science 1

CIT 383: Administrative Scripting Regular Expressions Computer Security: Art and Science 1

Topics 1. 2. 3. 4. Creating Regexp objects Regular expression syntax Pattern matching Substitution

Topics 1. 2. 3. 4. Creating Regexp objects Regular expression syntax Pattern matching Substitution CIT 383: Administrative Scripting

Regular Expressions Used to match patterns against strings. § UNIX commands: egrep, awk, sed

Regular Expressions Used to match patterns against strings. § UNIX commands: egrep, awk, sed § Ruby provides an expanded regexp syntax. Applications of regular expressions § Find every login failure in a log file. § Find every address you received email from. § Find every IP address in a file. CIT 383: Administrative Scripting

Creating a Regexp object Three methods re = Regexp. new('^s*[a-z]') re = /^s*[a-z]/ re

Creating a Regexp object Three methods re = Regexp. new('^s*[a-z]') re = /^s*[a-z]/ re = %r|^s*[a-z]| Modifiers i: ignore case when matching text m: multiline match, allow. to match n x: extended syntax with comments + whitespace o: perform #{} interpolations only once CIT 383: Administrative Scripting

Pattern Syntax Characters match themselves except. , |, (, ), [, ], {, },

Pattern Syntax Characters match themselves except. , |, (, ), [, ], {, }, +, , ^, $, *, ? Use to escape, i. e. | will match a | The. metacharacter matches any character. Anchors require match to match at start or end ^ matches the beginning of a line $ matches the end of a line A matches the beginning of a string Z matches the end of a string CIT 383: Administrative Scripting

Regexp Escape Sequences Similar to double quotes t is tab n is newline etc.

Regexp Escape Sequences Similar to double quotes t is tab n is newline etc. Word boundaries /red/ matches “red”, “bred”, “reddened” /bredb/ matches only “red” B matches nonword boundaries /brubB/ matches “ruby” but not “rub” CIT 383: Administrative Scripting

Character Classes Set of characters between brackets [aeiou] will match any vowel [0123456789] will

Character Classes Set of characters between brackets [aeiou] will match any vowel [0123456789] will match any digit Special characters aren’t special inside []’s Additional syntax [A-Z] is a range including all capital letters [A-Za-z 0 -9] is a range of alphanumerics [^A-Z] is a range of anything but capital letters CIT 383: Administrative Scripting

Special Character Classes Abbreviations d is [0 -9] D is [^0 -9] s is

Special Character Classes Abbreviations d is [0 -9] D is [^0 -9] s is [ trnf] S is [^ trnf] w is [A-Za-z 0 -9_] W is [^A-Za-z 0 -9_] CIT 383: Administrative Scripting POSIX Classes [: alnum: ] is [A-Za-z 0 -9] [: alpha: ] is [A-Za-z] [: digit: ] is [0 -9] [: xdigit: ] is [0 -9 A-Fa-f] [: lower: ] is [a-z] [: upper: ] is [A-Z] [: space: ] is [ trnf]

Alternation Vertical bar matches pattern before or after it pattern 1|pattern 2 Precedence red|blue

Alternation Vertical bar matches pattern before or after it pattern 1|pattern 2 Precedence red|blue matches either “red” or “blue” red ball|blue sky matches “red ball” or “blue sky” but not “red blue sky” or “red ball sky” Use parentheses to group in an expression red (ball|blue) sky CIT 383: Administrative Scripting

Repetition operators are greedy, matching as many occurrences as possible. re* matches zero or

Repetition operators are greedy, matching as many occurrences as possible. re* matches zero or more occurrences of re re+ matches one or more occurrences of re re? matches zero or one occurrences of re re{n} matches exactly n occurrences of re re{n, } matches n or more occurrences of re re{n, m} matches at least n and at most m occurrences of re CIT 383: Administrative Scripting

Additional features Backreferences Regular expressions remember matches in () /([Rr])uby&1 ails/ will match §

Additional features Backreferences Regular expressions remember matches in () /([Rr])uby&1 ails/ will match § Ruby & Rails § ruby & rails /(w+) 1/ will match a repeated word Greedy and non-greedy matching <. *> is greedy, will match “<ruby>perl>” <. *? > is non-greedy, will match “<ruby>” CIT 383: Administrative Scripting

Patching Matching Pattern-matching uses the =~ operator re = /[Rr]uby|[Pp]ython/ re =~ “Ruby is

Patching Matching Pattern-matching uses the =~ operator re = /[Rr]uby|[Pp]ython/ re =~ “Ruby is better than PHP. ” After successful match, can retrieve details: data = Regexp. last_match data. string: the string that was compared data. to_s: the part of the string that matched data. pre_match: portion of string before match data. post_match: portion of string after match data[1]: what first set of () matched data[2]: what second set of () matched data. captures: what all sets of parentheses matched CIT 383: Administrative Scripting

Pattern Matching Methods Slicing “ruby 123”[/d+/] “ruby 123”[/([a-z]+)(d+)/, 1] “ruby 123”[/([a-z]+)(d+)/, 2] r =

Pattern Matching Methods Slicing “ruby 123”[/d+/] “ruby 123”[/([a-z]+)(d+)/, 1] “ruby 123”[/([a-z]+)(d+)/, 2] r = “ruby 123” r. slice(/d+/) r. slice!(/d+/) # 123 # ruby # 123, r = “ruby” Splitting s = “one, two, three” s. split(‘, ‘) s. split(/s*, s*/) CIT 383: Administrative Scripting # [“one, ”, “two, ”, “three”] # [“one, “two”, “three”] # [“one”, ”two”, ”three”]

Substitutions The String class provides RE substitutions sub(re, str): return string where the first

Substitutions The String class provides RE substitutions sub(re, str): return string where the first substring matching re is replaced by str sub!(re, str): replace the first substring matching re with str gsub(re, str): return string where the all substrings matching re are replaced by str gsub!(re, str): replace all substrings matching re with str CIT 383: Administrative Scripting

Substitution Examples Remove ruby-style quotes line. sub!(/#. *$/, “”) Remove all non-digits line. gsub!(/D/,

Substitution Examples Remove ruby-style quotes line. sub!(/#. *$/, “”) Remove all non-digits line. gsub!(/D/, “”) Capitalize specified words line. gsub!(/brailsb/, ‘Rails’) Change “John Smith” to “Smith, John” name. sub!(/(w+)s+(w+)/, ‘2, 1’) Flip UNIX slashes to Windows slashes path. gsub!(%r|/|, ‘\’) CIT 383: Administrative Scripting

References 1. Michael Fitzgerald, Learning Ruby, O’Reilly, 2008. 2. David Flanagan and Yukihiro Matsumoto,

References 1. Michael Fitzgerald, Learning Ruby, O’Reilly, 2008. 2. David Flanagan and Yukihiro Matsumoto, The Ruby Programming Language, O’Reilly, 2008. 3. Hal Fulton, The Ruby Way, 2 nd edition, Addison. Wesley, 2007. 4. Robert C. Martin, Clean Code, Prentice Hall, 2008. 5. Dave Thomas with Chad Fowler and Andy Hunt, Programming Ruby, 2 nd edition, Pragmatic Programmers, 2005. Computer Security: Art and Science 16