IO Redirection Regular Expressions CS 2204 Class meeting

  • Slides: 18
Download presentation
I/O Redirection & Regular Expressions CS 2204 Class meeting 4 *Notes by Doug Bowman

I/O Redirection & Regular Expressions CS 2204 Class meeting 4 *Notes by Doug Bowman and other members of the (C) Doug. Tech. Bowman, Virginia Tech, CS faculty at Virginia Copyright 2001 -2003. 2001

Redirecting stdout n n n Instead of writing to the terminal, you can tell

Redirecting stdout n n n Instead of writing to the terminal, you can tell a program to print its output to another file using the > operator >> operator is used to append to a file Examples: n n n man ls > ls_help. txt Echo $PWD > current_directory cat file 1 >> file 2 (C) Doug Bowman, Virginia Tech, 2001 2

Redirecting stdin n n Instead of reading from the terminal, you can tell a

Redirecting stdin n n Instead of reading from the terminal, you can tell a program to read from another file using the < operator Examples: n n Mail user@domain. com < message interactive_program < command_list (C) Doug Bowman, Virginia Tech, 2001 3

Pipes and filters n n Pipe: a way to send the output of one

Pipes and filters n n Pipe: a way to send the output of one command to the input of another Filter: a program that takes input and transforms it in some way n n wc - gives a count of words/lines/chars grep - searches for lines with a given string more sort - sorts lines alphabetically or numerically (C) Doug Bowman, Virginia Tech, 2001 4

Examples of filtering n n n ls -la | more cat file | wc

Examples of filtering n n n ls -la | more cat file | wc man ksh | grep “history” ls -l | grep “gifford” | wc who | sort > current_users (C) Doug Bowman, Virginia Tech, 2001 5

What is a regular expression (RE)? n n n A pattern Defines a set

What is a regular expression (RE)? n n n A pattern Defines a set of strings of characters Any string in the set is said to be “matched” by the RE (the RE matches the string) (C) Doug Bowman, Virginia Tech, 2001 6

Why REs? n Pattern matching is a useful tool in many realworld situations: n

Why REs? n Pattern matching is a useful tool in many realworld situations: n n Search for a file on a filesystem Find and replace strings in a file Extract particular data elements from a database REs are an important part of formal languages - one of the basic CS theory disciplines (C) Doug Bowman, Virginia Tech, 2001 7

UNIX programs that use REs n n n n grep (search within files) egrep

UNIX programs that use REs n n n n grep (search within files) egrep (grep but with extended RE’s) vi/emacs (text editors) ex (line editor) sed (stream editor) awk (pattern scanning language) perl (scripting language) (C) Doug Bowman, Virginia Tech, 2001 8

Characters vs. metacharacters n n n In patterns, characters can be any character except

Characters vs. metacharacters n n n In patterns, characters can be any character except a newline Metacharacters are special characters that have a special meaning To use metacharacters as regular characters in a pattern, quote them with the ‘’ character (C) Doug Bowman, Virginia Tech, 2001 9

Basic vs. Extended RE’s n In basic regular expressions the metacharacters ? , +,

Basic vs. Extended RE’s n In basic regular expressions the metacharacters ? , +, {, |, (, and ) have no special meaning (grep). n n To make them special, instead use the backslashed versions ? , +, {, |, (, and ) For extended regular expressions these are interpreted as special n Grep –E egrep (C) Doug Bowman, Virginia Tech, 2001 10

Using egrep n n n egrep pattern filename(s) To be safe, put quotes around

Using egrep n n n egrep pattern filename(s) To be safe, put quotes around your pattern Examples: n n egrep “abc” file. txt (print lines containing “abc”) egrep -i “abc” file. txt (same, but ignore case) egrep -v “abc” file. txt (print lines not containing “abc”) egrep -n “abc” file. txt (include line numbers) (C) Doug Bowman, Virginia Tech, 2001 11

Metacharacters 1 n Period (. ): matches any single character n n n “a.

Metacharacters 1 n Period (. ): matches any single character n n n “a. c” matches abc, adc, a&c, a; c, … “u. . x” matches unix, uvax, u 3(x, … Asterisk (*) matches zero or more occurrences of the previous RE n n n Not the same as wildcards in the shell! “ab*c” matches ac, abbc, abbbc, … “. *” matches any string (C) Doug Bowman, Virginia Tech, 2001 12

Metacharacters 2 n Plus (+): matches one or more occurrences of the preceding RE

Metacharacters 2 n Plus (+): matches one or more occurrences of the preceding RE n n Question mark (? ): matches zero or one occurrences of the preceding RE n n “ab+c” matches abc, abbc, but not ac “ab? c” matches ac, abc but not abbc Logical or (|): matches RE before or RE after bar n “abc|def” matches abc or def (C) Doug Bowman, Virginia Tech, 2001 13

Metacharacters 3 n Caret (^): means beginning of line n n Dollar sign ($)

Metacharacters 3 n Caret (^): means beginning of line n n Dollar sign ($) means end of line n n “^D” matches all strings starting with D “d$” matches all strings ending with d Backslash (): used to quote other metacharacters n “file. txt” matches file. txt but not fileatxt (C) Doug Bowman, Virginia Tech, 2001 14

Metacharacters 4 n Square brackets ([ ]) indicate a set/range of characters n n

Metacharacters 4 n Square brackets ([ ]) indicate a set/range of characters n n Any characters in the set will match ^ before the set means “not” - between characters indicates a range Examples: n n “[f. F]un” matches fun, Fun “b[aeiou]g” matches bag, beg, big, bog, bug “[A-Z]. *” matches any string beginning with a captial letter “[^abc]. *” matches any string not starting with a, b, or c (C) Doug Bowman, Virginia Tech, 2001 15

Metacharacters 5 n Parentheses ( ): used to group REs when using other metacharacters

Metacharacters 5 n Parentheses ( ): used to group REs when using other metacharacters n n n “a(bc)*” matches a, abcbc, abcbcbc, … “(foot|base)ball” matches football, baseball Braces ({ }): specify the number of repetitions of an RE n n “[a-z]{3}” matches three lowercase letters “m. {2, 4}” matches strings starting with m between three and five letters (C) Doug Bowman, Virginia Tech, 2001 16

What do these mean? n n n egrep “^B. *s$” file egrep “ [0

What do these mean? n n n egrep “^B. *s$” file egrep “ [0 -9]{3} ” file egrep “num(ber)? [0 -9]+” file egrep “word” file | wc -l egrep “[A-Z]. *? ” file What about if they were grep? (C) Doug Bowman, Virginia Tech, 2001 17

Practice n Construct egrep commands that find in file: n n n Lines beginning

Practice n Construct egrep commands that find in file: n n n Lines beginning with a word of at least 10 characters Lines containing a student ID number in standard 3 part form Number of lines with 2 consecutive capitalized words Number of lines not ending in an alphabetic character Lines containing a word beginning with a vowel at the end of a sentence (C) Doug Bowman, Virginia Tech, 2001 18