Input Validation with Regular Expressions COEN 351 Input

  • Slides: 41
Download presentation
Input Validation with Regular Expressions COEN 351

Input Validation with Regular Expressions COEN 351

Input Validation n Security Strategies ¨ Black List n List all things that are

Input Validation n Security Strategies ¨ Black List n List all things that are NOT allowed ¨ ¨ List is difficult to create § Adding insecure constructs on a continuous basis means that the previous version was unsafe § Testing is based on known attacks. List from others might not be trustworthy. ¨ White List n List of things that are allowed ¨ ¨ List might be incomplete and disallow good content § Adding exceptions on a continuous basis does not imply security holes in previous versions. § Testing can be based on known attacks. List from others can be trusted if source can be trusted.

Perl Regular Expressions n Regular Expression = Pattern ¨ Template that either matches or

Perl Regular Expressions n Regular Expression = Pattern ¨ Template that either matches or does not match a string

Excursus: Getting Input in Perl Use <STDIN> to read from standard input n Use

Excursus: Getting Input in Perl Use <STDIN> to read from standard input n Use ‘defined’ construct to tell if read was successful n while(defined($line=<STDIN>)) print “I saw $line”; } {

Excursus: Getting Input in Perl Non-sensical shortcut n Uses standard loop variable $_ n

Excursus: Getting Input in Perl Non-sensical shortcut n Uses standard loop variable $_ n while(<STDIN>) { print "I saw $_"; } foreach(<STDIN>) { print "I saw $_"; } Gets line, executes body of loop. Gets all the lines, then executes body of loop. $_ is the default loop variable.

Excursus: Getting Input in Perl The STDIN is a default n chomp acts on

Excursus: Getting Input in Perl The STDIN is a default n chomp acts on default variable $_ n while(<>) { chomp; print "I saw $_n"; }

Perl Regular Expressions Matching and substitution are fundamental tasks in Perl n Implemented using

Perl Regular Expressions Matching and substitution are fundamental tasks in Perl n Implemented using one letter operators: n ¨ m/PATTERN/ ¨ m// n pattern matching ¨ s/PATTERN/REPLACEMENT/ ¨ s/// n Substitution

Perl Regular Expressions n Meta-characters in a pattern need escaping with backslash ¨ ¨

Perl Regular Expressions n Meta-characters in a pattern need escaping with backslash ¨ ¨ ¨ ¨ ¨ | ( ) [ ] { } ^ $ * + ?

Perl Regular Expressions n Interpolation ¨ Perl substitutes strings $foo = “bar”; /$foo$/; ¨

Perl Regular Expressions n Interpolation ¨ Perl substitutes strings $foo = “bar”; /$foo$/; ¨ Equivalent /bar$/; to: in strings:

Perl Regular Expression: Binding Operator Pattern matching is so frequent in Perl that there

Perl Regular Expression: Binding Operator Pattern matching is so frequent in Perl that there is a special operator n Normally, pattern matching is done on default operand $_ n =~ binds a string expression to a pattern match (substitution, transliteration) n

Perl Regular Expression: Binding Operator =~ has left operand a string n =~ has

Perl Regular Expression: Binding Operator =~ has left operand a string n =~ has right operand a pattern n ¨ Could be interpreted at run time. Returns true / false depending on the success of match. n !~ operation is the same, but result is negated. n

Perl Regular Expression: Binding Operator $_ =~ $pat; is equivalent to $_ =~ /$pat/;

Perl Regular Expression: Binding Operator $_ =~ $pat; is equivalent to $_ =~ /$pat/; but is less efficient since giving the pattern directly since the regular expression will be recompiled at run time

Perl Regular Expression: Binding Operator Example if ( ($k, $v) = $string =~ m/(w+)=(w*)/)

Perl Regular Expression: Binding Operator Example if ( ($k, $v) = $string =~ m/(w+)=(w*)/) { print “Key $k Value $vn”; } Since =~ has precedence over =, it is evaluated first. The binding operator binds variable $string to a pattern looking for expressions like “ key=word. The binding expression is done in a list context, hence, the resulting matches are returned as a list. The list is then assigned to ($k, $v). The result of the assignment is the number of things assigned, i. e. typically 2. Since 2 is not 0, this is equivalent to true and hence the if-block is entered.

Perl Regular Expressions n Qualifiers: ¨* matches the preceding character zero or more times.

Perl Regular Expressions n Qualifiers: ¨* matches the preceding character zero or more times. n Pattern “abc*d” is matched by ¨ ¨ ¨ Use rabd zabccccd parentheses to group letters #/perl/bin/perl while(<>) { chomp; last if $_ eq 'stop'; if (/abc*d /) { print "Matched: |$`<$&>$'|n"; } else { print "No match. n"; } } while(<>) { chomp; last if $_ eq 'stop'; if (/a(bc)*d /) { print "Matched: |$`<$&>$'|n"; } else { print "No match. n"; } }

Perl Regular Expressions n Qualifiers: ¨ ‘*’ matches zero or more instances ¨ ‘+’

Perl Regular Expressions n Qualifiers: ¨ ‘*’ matches zero or more instances ¨ ‘+’ matches one or more instances n ¨ ‘? ’ “ab(cde)+fg” matches none or one

Perl Regular Expressions n Alternatives ¨ ‘|’ n “or” Either the right or the

Perl Regular Expressions n Alternatives ¨ ‘|’ n “or” Either the right or the left side matches

Perl Regular Expressions n Character Classes ¨ List of possible characters inside a square

Perl Regular Expressions n Character Classes ¨ List of possible characters inside a square bracket ¨ Example: [a-cw-z]+ n [a-z. A-Z 0 -9] n ¨ Negation n provided by caret [^n-z] matches any character but ‘n’, ‘-’, ‘z’

Perl Regular Expressions n Character classes shortcuts ¨ w (word) is a shortcut for

Perl Regular Expressions n Character classes shortcuts ¨ w (word) is a shortcut for [A-Za-z 0 -9] ¨ s (space) is a shortcut for [ftnr ] ¨ d (digit) is a shortcut for [0 -9] ¨ [^d] anything but a digit ¨ [^s] anything but a space character ¨ [^w] anything but a word character

Perl Regular Expressions n Perl regex semantics are based on: ¨ Greed n Perl

Perl Regular Expressions n Perl regex semantics are based on: ¨ Greed n Perl tries to match as much of an expression as is possible ¨ Eagerness n n Perl gives the first possible match The left-most match wins ¨ Backtracking n n The entire expression needs to match Perl regex evaluation backtracks if match is impossible

Perl Regular Expressions n Eagerness Example: ¨ What is the result of this snippet

Perl Regular Expressions n Eagerness Example: ¨ What is the result of this snippet $string = “boo hoo“; $string =~ s/o*/e/; boo hoo bee hoo boo hee eboo hoo #left side of =~ needs to be an l-value

Perl Regular Expressions n n Quantifiers *, +, ? are not always enough Specify

Perl Regular Expressions n n Quantifiers *, +, ? are not always enough Specify number of occurrences by placing comma separated range in curly brackets ¨ /a{2, 12}/ n 2 to 12 ‘a’ ¨ /a{5, }/ n 5 or more ‘a’ ¨ /a{5}/ n exactly 5 ‘a’

Perl Regular Expressions n Anchors ¨ pattern can match everywhere in the string unless

Perl Regular Expressions n Anchors ¨ pattern can match everywhere in the string unless you use anchors ¨ ^ beginning of string ¨ $ end of string ¨ /b start or end of a group of w-characters ¨ /B non-word boundary anchor n Examples: ¨ /^hello/ matches only at beginning of string ¨ /world$/ matches only at the end of string

Perl Regular Expressions n Parentheses and Memory ¨ ¨ ¨ ( ) group together

Perl Regular Expressions n Parentheses and Memory ¨ ¨ ¨ ( ) group together part of a pattern Also remember corresponding match part of string. These are put into a backreference n n n Made by backslash followed by number Available as $1, … after matching Examples ¨ ¨ ¨ /(. )1/ matches any character followed by itself /. . / matches any two characters /([‘”]). *1/ matches any string starting with single or double quotes followed by zero or more arbitrary characters followed by the same type of quotes. n n n “doesn’t match’ “does match” ‘does match’

Perl Regular Expressions Validating e-mail n Out of channel verification: Ask for email addresses

Perl Regular Expressions Validating e-mail n Out of channel verification: Ask for email addresses twice to weed out typos. ¨ Send email to address given. ¨ Still need to prevent command-line insertion ¨ n Lookup of DNS records for MX records ¨ n Assumes site connectivity Regular expressions ¨ Typically have subtle errors n n tom&jerry@warnerbros. com is valid, but fails simple regex president@whitehouse. gov is valid, deliverable, but probably fake

Perl Regular Expressions Validating email n if ( $email =~ /@/ ) { …

Perl Regular Expressions Validating email n if ( $email =~ /@/ ) { … } ¨ n if ( $email =~ /S+@S+/ ) ¨ ¨ n n checks for non-white space characters divided by an ampersand matches thomas@hotmail if ( $email =~ /S+@S+. S+ ) if ( $email =~ /[w-]+@[w-]+. [w-]+/ ¨ n checks for an ampersand matches most valid emails, but allows multiple emails if ( $email =~ /^[w-]+@[w-]+. [w-]+$/ ¨ anchored at beginning and end of word

Perl Regular Expressions n Checking for strings that only contain alphabetic characters. ¨ ASCII

Perl Regular Expressions n Checking for strings that only contain alphabetic characters. ¨ ASCII based regex is insufficient: n n ¨ if($var =~ /^[a-z. A-Z]+$/) Does not work for characters with diacritic marks Best solution is to use Unicode properties n n if($var =~ /^[^Wd_]+$/) Explanation: ¨ ¨ w matches alphabetic, numeric, underscore (alphanumunder) W is a non-alphanumunder [^Wd_] is a character that is neither non-alphanumunder, digit, or underscore, hence an alphabetic character Could also use POSIX character classes, but those depend on locale

Perl Regular Expressions n Making regex readable ¨ Place semantic units into a variable

Perl Regular Expressions n Making regex readable ¨ Place semantic units into a variable with an appropriate name $optional_sign = ‘[-+]? ‘; $mandatory_digits = ‘d+’; $decimal_point = ‘. ? ’; $optinonal_digits = ‘d*’; $number = $optional_sign. $mandatory_digits. $decimal_point. $optional_digits; if ( /($number)/) { … }

Perl Regular Expressions

Perl Regular Expressions

Perl Regular Expressions

Perl Regular Expressions

Perl Regular Expressions

Perl Regular Expressions

Perl Regular Expressions

Perl Regular Expressions

Perl Regular Expressions

Perl Regular Expressions

Perl Regular Expressions

Perl Regular Expressions

Perl Regular Expressions

Perl Regular Expressions

Perl Regular Expressions

Perl Regular Expressions

Perl Regular Expressions

Perl Regular Expressions

Perl Regular Expressions

Perl Regular Expressions

Perl Regular Expressions

Perl Regular Expressions

Perl Regular Expressions

Perl Regular Expressions

Perl Regular Expressions

Perl Regular Expressions

Perl Regular Expressions

Perl Regular Expressions