Introduction to Perl Part II By Cdric Notredame

Introduction to Perl Part II By: Cédric Notredame (Adapted from BT Mc. Innes)

Passing Arguments To Your Program

Command Line Arguments n Command line arguments in Perl are extremely easy. n @ARGV is the array that holds all arguments passed in from the command line. n Example: n n . /prog. pl arg 1 arg 2 arg 3 @ARGV would contain ('arg 1', ‘arg 2', 'arg 3’) n $#ARGV returns the number of command line arguments that have been passed. n Remember $#array is the size of the array!

Reading/Writing Files

File Handlers n Opening a File: open (SRC, “my_file. txt”); n Reading from a File $line = <SRC>; # reads upto a newline character n Closing a File close (SRC);

File Handlers cont. . . n Opening a file for output: open (DST, “>my_file. txt”); n Opening a file for appending open (DST, “>>my_file. txt”); n Writing to a file: print DST “Printing my first line. n”; n Safeguarding against opening a non existent file open (SRC, “file. txt”) || die “Could not open file. n”;

File Test Operators n Check to see if a file exists: if ( -e “file. txt”) { # The file exists! } n Other file test operators: -r readable -x executable -d is a directory -T is a text file

Quick Program with File Handles n Program to copy a file to a destination file #!/usr/bin/perl -w open(SRC, “file. txt”) || die “Could not open source file. n”; open(DST, “>newfile. txt”); while ( $line = <SRC> ) { print DST $line; } close SRC; close DST;

Some Default File Handles n STDIN : Standard Input $line = <STDIN>; # takes input from stdin n STDOUT : Standard output print STDOUT “File handling in Perl is sweet!n”; n STDERR : Standard Error print STDERR “Error!!n”;

The <> File Handle n The “empty” file handle takes the command line file(s) or STDIN; n $line = <>; n If program is run. /prog. pl file. txt, this will automatically open file. txt and read the first line. n If program is run. /prog. pl file 1. txt file 2. txt, this will first read in file 1. txt and then file 2. txt. . . you will not know when one ends and the other begins.

The <> File Handle cont. . . n If program is run. /prog. pl, the program will wait for you to enter text at the prompt, and will continue until you enter the EOF character n CTRL-D in UNIX

Example Program with STDIN n Suppose you want to determine if you are one of the three stooges #!/usr/local/bin/perl %stooges = (larry => 1, moe => 1, curly => 1 ); print “Enter your name: ? “; $name = <STDIN>; chomp $name; if($stooges{ lc($name) }) { print “You are one of the Three Stooges!!n”; } else { print “Sorry, you are not a Stooge!!n”; }

Combining File Content Given The two Following Files: File 1. txt 1 2 3 And File 2. txt a b c Write a program that takes the two files as arguments and outputs a third file that looks like: File 3. txt 1 a 2 b 3 Tip: . /mix_files File 1. txt File 2. txt File 3. txt

Combining File Content #! /usr/bin/perl open (F, “$ARGV[0]); open (G, “$ARGV[1]); open (H, “>$ARGV[2]); while ( defined (F) && defined (G) && ($l 1=<F>) && ($l 2=<G>)) { print H “$l 1$l 2”; } close (F); close (G); close (H);

Chomp and Chop n Chomp : function that deletes a trailing newline from the end of a string. n $line = “this is the first line of textn”; n chomp $line; # removes the new line character n print $line; # prints “this is the first line of # text” without returning n Chop : function that chops off the last character of a string. n $line = “this is the first line of text”; n chop $line; n print $line; #prints “this is the first line of tex”

Matching Regular Expressions

Regular Expressions n What are Regular Expressions. . a few definitions. n n Specifies a class of strings that belong to the formal / regular languages defined by regular expressions In other words, a formula for matching strings that follow a specified pattern. n Some things you can do with regular expressions n n n Parse the text Add and/or replace subsections of text Remove pieces of the text

Regular Expressions cont. . n A regular expression characterizes a regular language n Examples in UNIX: n ls *. c n n Lists all the files in the current directory that are postfixed '. c' ls *. txt n Lists all the files in the current directory that are postfixed '. txt'

Simple Example for. . . ? Clarity n In the simplest form, a regular expression is a string of characters that you are looking for n We want to find all the words that contain the string 'ing' in our text. n The regular expression we would use : /ing/

The Match Operator n What would are program then look like: if($word=~m/ing/) { print “$wordn”; }

Exercise: n Download any text you wish from the internet and count all the words in “ing” it contains… n wget “http: //www. trinity. edu/~mkearl/family. html”

Exercise: #!/usr/local/bin/perl while(<>) { chomp; @words = split/ /; foreach $word(@words) { if($word=~m/ing/) { print “$wordn”; $ing++; } } } print “$ing Words in ingn”;

Regular Expressions Types n Regular expressions are composed of two types of characters: n Literals n n n Normal text characters Like what we saw in the previous program ( /ing/ ) Metacharacters n n special characters Add a great deal of flexibility to your search

Metacharacters n Match more than just characters n Match line position n ^ start of a line n $ end of a line ( carat ) ( dollar sign ) n Match any characters in a list : [. . . ] n Example : n /[Bb]ridget/ n /Mc[Ii]nnes/ matches Bridget or bridget matches Mc. Innes or Mcinnes

Our Simple Example Revisited n Now suppose we only want to match words that end in 'ing' rather than just contain 'ing'. n How would we change are regular expressions to accomplish this: n Previous Regular Expression: $word =~m/ ing / n New Regular Expression: $word=~m/ ing$ /

Ranges of Regular Expressions n Ranges can be specified in Regular Expressions n Valid Ranges n [A-Z] Upper Case Roman Alphabet n [a-z] Lower Case Roman Alphabet n [A-Za-z]Upper or Lower Case Roman Alphabet n [A-F] Upper Case A through F Roman Characters n [A-z] Valid but be careful n Invalid Ranges n [a-Z] n [F-A] Not Valid

Ranges cont. . . n Ranges of Digits can also be specified n [0 -9] Valid n [9 -0] Invalid n Negating Ranges n / [^0 -9] / n Match anything except a digit n / [^a] / n Match anything except an a n / ^[^A-Z] / n Match anything that starts with something other than a single upper case letter n First ^ : start of line n Second ^ : negation

Our Simple Example Again n Now suppose we want to create a list of all the words in our text that do not end in 'ing' n How would we change are regular expressions to accomplish this: n Previous Regular Expression: $word =~m/ ing$ / n New Regular Expression: !($word=~m/ (ing)$ /)

Matching Interogations $string=~/([^. ? ]+? )/ n $string=~/[. ? ]([A-Z 0 -9][^. ? ]+? )/ n $string=~/([ws]+? )/ n

Removing HTML Tags $string=~s/<[^>]+>/ /g n g: substitute EVERY instance n

Literal Metacharacters n Suppose that you actually want to look for all strings that equal ‘$' in your text n n Use the symbol / $ / Regular expression to search for n What does the following Regular Expressions Match? / [ ABCDEFGHIJKLMNOP$] $/ / [ A-P$ ] $ / n Matches any line that contains ( A-P or $) followed by $

Patterns provided in Perl n Some Patterns n d n w n s pattern) n D n W n S [0– 9] [a – z A – Z 0 – 9_] [ r t n f ] [^ 0 - 9] [^ a – z A – Z 0 – 9_] [^ r t n f] n Example : ( 19dd ) n Looks for any year in the 1900's (white space

Using Patterns in our Example n Commonly words are not separated by just a single space but by tabs, returns, ect. . . n Let's modify our split function to incorporate multiple white space #!/usr/local/bin/perl while(<>) { chomp; @words = split/s+/, $_; foreach $word(@words) { if($word=~m/ing$/) { print “$wordn”; } }

Word Boundary Metacharacter n Regular Expression to match the start or the end of a 'word' : b n Examples: n n n / Jeffb / / Carolb / / Rollinb / /bformb/ information Match Jeff but not Jefferson Match Carol but not Caroline Match Rollin but not Rolling Match form or formation but not Information Match form but neither nor

DOT Metacharacter n The DOT Metacharacter, '. ' symbolizes any character except a new line n / b. bble/ n Would possibly return : bobble, babble, bubble n /. oat/ n Would possibly return : boat, coat, goat n Note: remember '. *' usually means a bunch of anything, this can be handy but also can have hidden ramifications.

PIPE Metacharacter n The PIPE Metacharacter is used for alternation n / Bridget (Thomson | Mc. Innes) / n Match Bridget Thomson or Bridget Mc. Innes but Bridget Thomson Mc. Innes n / B | bridget / n Match B or bridget n / ^( B | b ) ridget / n Match Bridget or bridget at the beginning of a line NOT

Our Simple Example n Now with our example, suppose that we want to not only get all words that end in 'ing' but also 'ed'. n How would we change are regular expressions to accomplish this: n Previous Regular Expression: $word =~m/ ing / n New Regular Expression: $word=~m/ (ing|ed)/

The ? Metacharacter n The metacharacter, ? , indicates that the character immediately preceding it occurs zero or one time n Examples: n / worl? ds / n n Match either 'worlds' or 'words' / m? ethane / n Match either 'methane' or 'ethane'

The * Metacharacter n The metacharacter, *, indicates that the character immediately preceding it occurs zero or more times n Example : n / ab*c/ Match 'ac', 'abbc', 'abbbc' ect. . . n Matches any string that starts with an a, if possibly followed by a sequence of b's and ends with a c. n Sometimes called Kleene's star

Our Simple Example again Now suppose we want to create a list of all the words in our text that end in 'ing' or 'ings' How would we change are regular expressions to accomplish this: n n n Previous Regular Expression: $word =~m/ ing$ / n New Regular Expression: $word=~m/ ings? $ /

Exercise n For each of the strings (a)--(e), say which of the patterns (i)--(xii) it matches. Where there is a match, what would be the values of $MATCH, $1, $2, etc. ? n n n n n 1) 2) 3) 4) 5) the quick brown fox jumped over the lazy dog The Sea! (. +)s*1 9780471975632 C: DOSPATHNAME 1) 2) 3) 4) 5) 6) 7) 8) 9) 11) 12) 13) /[a-z]/ /(W+)/ /W*/ /^w+$/ /[^w+$]/ /d/ /(. +)s*1/ /((. +)s*1)/ /(. +)s*((1))/ /DOS/ /\DOS/

Exercise n For each of the strings (a)--(e), say which of the patterns (i)--(xii) it matches. Where there is a match, what would be the values of $MATCH, $1, $2, etc. ? n n n n n 1) 2) 3) 4) 5) the quick brown fox jumped over the lazy dog The Sea! (. +)s*1 9780471975632 C: DOSPATHNAME 1) 2) 3) 4) 5) 6) 7) 8) 9) 10) 11) 12) /[a-z]/ /(W+)/ /W*/ /^w+$/ /[^w+$]/ /d/ /(. +)s*1/ /((. +)s*1)/ /(. +)s*((1))/ /DOS/ /\DOS/ 1, 2, 3, 5 4 1, 2, 3, 5 3, 4 2, 2 5 5 5 1, 2, 3, 5, 7, 9 1, 2, 3, 5, 6 3, 4, 6 2, 3, 5, 10, 11, 12

Modifying Text With Regular Expressions

Modifying Text n Match n Up to this point, we have seen attempt to match a given regular expression n Example : $variable =~m/ regex / n Substitution n Takes match one step further : if there is a match, then replace it with the given string n Example : $variable =~s/ regex / replacement/ $var =~ s/ Cedric / Notredame /g; $var =~ s/ing/ed /;

Substitution Example n Suppose when we find all our words that end in 'ing' we want to replace the 'ing' with 'ed'. #!/usr/local/bin/perl -w while(<>) { chomp $_; @words = split/ s+/, $_; foreach $word(@words) { if($word=~s/ing$/ed/) { print “$wordn”; } } }

Special Variable Modified by a Match n $target=“I have 25 apples” $target=~/(d+)/ n $& => 25 n Copy of text matched by the regex n $' =>”I have “ n A copy of the target text until the first match n $` => “ apples” n A copy of the target text after the last match n $1, $2, $3, ect $1=25 n The text matched by 1 st, 2 nd, ect. , set of parentheses. Note : $0 is not included here n $+ n A copy of the highest numbered $1, $2, $3, ect. .

Our Simple Example once again n Now lets revise our program to find all the words that end in 'ing' without splitting our line of text into an array of words #!/usr/local/bin/perl -w while(<>) { chomp $_; if($_=~/([A-Za-z]*ingb)/g) { print "$&n"; } }

$Example #!/usr/local/bin $exp = <STDIN>; chomp $exp; if($exp=~/^([A-Za-z+s]*)bcraveb([s. A-Za-z]+)/) { print “$1n”; print “$2n”;$

Example #!/usr/local/bin $exp = <STDIN>; chomp $exp; if($exp=~/^([A-Za-z+s]*)bcraveb([s. A-Za-z]+)/) { print “$1n”; print “$2n”; } n Run Program with string : I crave to rule the world! n Results: n “I “ n to rule the world!

Example #!/usr/local/bin $exp = <STDIN>; chomp $exp; if($exp=~/bcraveb/) { print “$`n”; print “$&n”; print “$’n”; } n Run Program with string : I crave to rule the world! n Results: n I n crave n to rule the world!

Thank you