CIT 383 Administrative Scripting Regular Expressions Computer Security
- Slides: 16
CIT 383: Administrative Scripting Regular Expressions Computer Security: Art and Science 1
Topics 1. 2. 3. 4. Creating Regexp objects Regular expression syntax Pattern matching Substitution CIT 383: Administrative Scripting
Regular Expressions Used to match patterns against strings. § UNIX commands: egrep, awk, sed § Ruby provides an expanded regexp syntax. Applications of regular expressions § Find every login failure in a log file. § Find every address you received email from. § Find every IP address in a file. CIT 383: Administrative Scripting
Creating a Regexp object Three methods re = Regexp. new('^s*[a-z]') re = /^s*[a-z]/ re = %r|^s*[a-z]| Modifiers i: ignore case when matching text m: multiline match, allow. to match n x: extended syntax with comments + whitespace o: perform #{} interpolations only once CIT 383: Administrative Scripting
Pattern Syntax Characters match themselves except. , |, (, ), [, ], {, }, +, , ^, $, *, ? Use to escape, i. e. | will match a | The. metacharacter matches any character. Anchors require match to match at start or end ^ matches the beginning of a line $ matches the end of a line A matches the beginning of a string Z matches the end of a string CIT 383: Administrative Scripting
Regexp Escape Sequences Similar to double quotes t is tab n is newline etc. Word boundaries /red/ matches “red”, “bred”, “reddened” /bredb/ matches only “red” B matches nonword boundaries /brubB/ matches “ruby” but not “rub” CIT 383: Administrative Scripting
Character Classes Set of characters between brackets [aeiou] will match any vowel [0123456789] will match any digit Special characters aren’t special inside []’s Additional syntax [A-Z] is a range including all capital letters [A-Za-z 0 -9] is a range of alphanumerics [^A-Z] is a range of anything but capital letters CIT 383: Administrative Scripting
Special Character Classes Abbreviations d is [0 -9] D is [^0 -9] s is [ trnf] S is [^ trnf] w is [A-Za-z 0 -9_] W is [^A-Za-z 0 -9_] CIT 383: Administrative Scripting POSIX Classes [: alnum: ] is [A-Za-z 0 -9] [: alpha: ] is [A-Za-z] [: digit: ] is [0 -9] [: xdigit: ] is [0 -9 A-Fa-f] [: lower: ] is [a-z] [: upper: ] is [A-Z] [: space: ] is [ trnf]
Alternation Vertical bar matches pattern before or after it pattern 1|pattern 2 Precedence red|blue matches either “red” or “blue” red ball|blue sky matches “red ball” or “blue sky” but not “red blue sky” or “red ball sky” Use parentheses to group in an expression red (ball|blue) sky CIT 383: Administrative Scripting
Repetition operators are greedy, matching as many occurrences as possible. re* matches zero or more occurrences of re re+ matches one or more occurrences of re re? matches zero or one occurrences of re re{n} matches exactly n occurrences of re re{n, } matches n or more occurrences of re re{n, m} matches at least n and at most m occurrences of re CIT 383: Administrative Scripting
Additional features Backreferences Regular expressions remember matches in () /([Rr])uby&1 ails/ will match § Ruby & Rails § ruby & rails /(w+) 1/ will match a repeated word Greedy and non-greedy matching <. *> is greedy, will match “<ruby>perl>” <. *? > is non-greedy, will match “<ruby>” CIT 383: Administrative Scripting
Patching Matching Pattern-matching uses the =~ operator re = /[Rr]uby|[Pp]ython/ re =~ “Ruby is better than PHP. ” After successful match, can retrieve details: data = Regexp. last_match data. string: the string that was compared data. to_s: the part of the string that matched data. pre_match: portion of string before match data. post_match: portion of string after match data[1]: what first set of () matched data[2]: what second set of () matched data. captures: what all sets of parentheses matched CIT 383: Administrative Scripting
Pattern Matching Methods Slicing “ruby 123”[/d+/] “ruby 123”[/([a-z]+)(d+)/, 1] “ruby 123”[/([a-z]+)(d+)/, 2] r = “ruby 123” r. slice(/d+/) r. slice!(/d+/) # 123 # ruby # 123, r = “ruby” Splitting s = “one, two, three” s. split(‘, ‘) s. split(/s*, s*/) CIT 383: Administrative Scripting # [“one, ”, “two, ”, “three”] # [“one, “two”, “three”] # [“one”, ”two”, ”three”]
Substitutions The String class provides RE substitutions sub(re, str): return string where the first substring matching re is replaced by str sub!(re, str): replace the first substring matching re with str gsub(re, str): return string where the all substrings matching re are replaced by str gsub!(re, str): replace all substrings matching re with str CIT 383: Administrative Scripting
Substitution Examples Remove ruby-style quotes line. sub!(/#. *$/, “”) Remove all non-digits line. gsub!(/D/, “”) Capitalize specified words line. gsub!(/brailsb/, ‘Rails’) Change “John Smith” to “Smith, John” name. sub!(/(w+)s+(w+)/, ‘2, 1’) Flip UNIX slashes to Windows slashes path. gsub!(%r|/|, ‘\’) CIT 383: Administrative Scripting
References 1. Michael Fitzgerald, Learning Ruby, O’Reilly, 2008. 2. David Flanagan and Yukihiro Matsumoto, The Ruby Programming Language, O’Reilly, 2008. 3. Hal Fulton, The Ruby Way, 2 nd edition, Addison. Wesley, 2007. 4. Robert C. Martin, Clean Code, Prentice Hall, 2008. 5. Dave Thomas with Chad Fowler and Andy Hunt, Programming Ruby, 2 nd edition, Pragmatic Programmers, 2005. Computer Security: Art and Science 16
- Private secuirty
- Cmpt383
- Ieee 383
- Finance 383
- Mcit 592
- Cit 595
- Cit 593 introduction to computer systems
- Enhanced security administrative environment
- Regex xkcd
- Simplifying rational expressions
- Regular expressions
- Inductive definition of regular expressions
- Regular expressions wikipedia
- Algebraic properties of regular expression
- Regular expressions
- Primitive regular expressions
- Construction of epsilon nfa from regular expression