grep Global REgular expresion Print Operation Search a
grep (Global REgular expresion Print) • Operation – Search a group of files – Find all lines that contain a particular regular expression pattern – Write the result to an output file – grep returns to the prompt with no extra output when it is done • Syntax: grep [-cil. Lnrsvwx] pattern [list of files] • Examples – find information about the user, harley >grep harley /etc/passwd – Find all lines in the files containing the string xxx. >grep xxx.
grep Flags 1. -c count the number of matches 2. -i Ignore case when searching for matches 3. -l List the file names containing matches 4. -L list files that do not have a match 5. -n Write the line number in front of each line 6. -r perform a recursive directory search 7. -s suppress warning and error messages 8. -v search for lines without the matching pattern 9. -w search only for complete words 10. -x lines that exactly match the pattern
Regular Expressions Note: Many UNIX programs use these (vi, sed, more, grep, awk) • Industry standard way to specify patterns – In Java: string. match("pattern"); – In Java: string. replace. All("pattern", string) • Meta-characters/operators (some need to be escaped) ^ beginning of line, $ end of a line * match 0 or more of the previous group + match 1 or more of the previous group ? match 0 or one of the previous group {n} match n of the previous group {m, n} match m to n of the previous group {n, } match n or more of the previous group | match either the group before or the groups after. match any character except for new line literally interpret the following meta-character or operator
Regular Expression Examples Regular Expression [a-z](12){3}[c-e]{3} a. *e+ a. *f String a 121212 cde abc 12 cde Match Yes No ^a. *e$ ^b*e$ ^a*e$ ^. *$ ^. *$ ^*$ abc 12 cde ^ab 12 cd$ Yes No Note: To use ( ) { } or + grep use the –E (extended) switch or precede with
More grep Examples Contents of a file called homework Math: problems 12 -10 to 12 -33, due Monday Basket. Weaving: make a 6 -inch basket, DONE Psychology: essay on Animal Existentialism, due end of term Surfing: catch at least 10 grep commands >grep –v DONE homework displays all but line 2 >grep –c DONE homework displays 1 >grep –wi ". *a. *" on homework displays all lines >grep –w "m. *e" homework displays line 2 >grep –i "d. *e" homework displays lines 1, 2 and 3 >grep '(Ma|DO). *' homework displays lines 1 and 2 Note: the last example escapes the parentheses and the vertical bar
Sorting Data • Background – Each line in a file is a record – Each line is a series of fields separated by spaces and/or tabs • Commands >sort file. Name sorts file. Name on the 1 st field of each line >sort -k 6 file. Name sorts on the 6 th field of each line >sort –n –k 5 file. Name sort on the 5 th field numerically >sort –t sort –k 4 r –k 3 abc file. Name sort descending on the 4 th field, and then ascending on the 3 rd with ': ' as a delimeter >sort –t ': ' file. Name sort using ': ' as a separator character >sort –u –k 2 r file. Name sort reverse on the 2 nd field and remove duplicates (output must be unique) >sort –k 3, 4 in a pipe sorts by the key, from field 3 through field 4 >sort –k 5 n –k 8 sorts numeric by the 5 th field and alphabetic by the 8 th
SED (Stream Editor • SED is a filter – – Input from stdin or a file Output to stdout or a file Modifies the input to produce the output Non-interactive • Processing – Read from an input stream – Perform line oriented commands – Write to an output stream • Syntax: >sed [-i] command | [-e command] … [file]
Search and Replace Note: This syntax works in vi, more, awk • Search, change and redirect to new. File >sed ‘s/cat/dog/g' file > new. File • Search, change, and edit file >sed –i ‘s/cat/dog/g' file • Specific range of lines: >sed '5, 10 s/cat/dog/g' file • Lines apply search to lines containing OK: >sed '/OK/s/cat/dog/g' names • Lines apply to lines having 2 numeric characters >sed '/[0 -9]{2}/s/cat/dog/g' names • Delete range of lines: >sed '5, 10 d' file Note: single quotes suppress the shell's interpretation of special characters Note: You must escape the characters: +, { and } for it to work
Complex Commands sed –i -e 's/mon/Monday/g' -e 's/tue/Tuesday/g' -e 's/wed/Wednesday/g' -e 's/thu/Thursday/g' -e 's/fri/Friday/g' -e 's/sat/Saturday/g' -e 's/sun/Sunday/g' calendar • The backslash is a continuation character • The –e specifies another command (extension) • The '/g/ means change every occurrence on each line, not just the first
AWK • AWK (Aho, Weinberger, Kernigham) • Special purpose programming language – Interpretive – Useful for UNIX Scripts • Purposes – – – Filter text files based on supplied patterns Produce reports Callable from "vi" Create simple databases Simple mathematical operations Creating scripts • Not good for large complicated tasks • Other interpretive languages: perl, php
General Syntax • The single quote causes the shell to ignore special characters • The various clauses are optional • Much of the syntax for <action> clauses is c and Java compatible • The patterns utilize regular expressions BEGIN {<initialization>} <pattern> {<action>} • • • <pattern> {<action>} END {<final actions>} >awk '<awk program>'
AWK General Operation • Each file consists of a series of records • Each record is a series of fields • Defaults – Record separator: new line character – Field separator: white space characters • Flow of Operation – Read the input file line by line – If it matches the line, then process – Otherwise skip
Some AWK Simple Examples 1. Print fields of records in a file >awk ' {print $5, $6, $7, $8} ' file. Name 2. Print lines with a search string >awk '/gold/ {print}' file. Name 3. Print the number of records >awk 'END {print NR, "records"}' file. Name 4. Print records using a condition >awk '{if ($3 < 1980) print $3}' file. Name or >awk ‘$2 > max {println $2}’ file. Name 5. Comparing field to regular expression >awk ‘$2 ~ /[0 -9]+/ {print $2}’ file. Name 6. Using variables >awk '/gold/{sum += $2} END {print "value = " sum}‘ file. Name
A Longer AWK command Note: The backslashes are continuation lines awk –F '; ' Semi colons delimit the 'BEGIN in the file {num_gold=0; wt_gold=0; } /[Gg]old/ { num_gold++; wt_gold += $2; } END { printf("n Gold Pieces: %2 d %5. 2 fn“, num_gold, wt_gold); Gold 3. 5 }' Silver 2. 25 gold. File Output Gold Pieces: 3 26. 94 Bronze 5. 31 Gold 23. 22 gold 0. 22 gold. File fields
Execute Program in a file awk –F '; ' –f <program> <file. Name> # awk program summarizing a coin collection BEGIN {num_gold=0; wt_gold=0; } /gold/ {num_gold++; wt_gold += $2}; END { val_gold = 485 * wt_gold; printf("n Gold Pieces: %2 d", num_gold); printf("n Gold Weight: %5. 2 f", wt_gold); printf("n Gold Value: %7. 2 fn", val_gold); } Output Gold Pieces: 3 Gold Weight: 26. 94 Gold Value: 13065. 90
Invoking AWK >awk [-F<ch>] [<program>] [-f <program. File>] [<vars>] [- | <datafile>] • • <ch> is a field separator (default: space, tab) <program> an AWK program <program. File> a file containing an AWK program <vars> a series of variables to initialize >awk –f program f 1=file 2 f 2=file 1 > output • - means accept AWK input from STDIN • <data. File> a file containing data to process Note: AWK is often invoked repeatedly in shell scripts
Search Patterns • • • An exact string: /The/ A string starting a line: /^The/ A string ending a line: /The$/ A String ignoring case of first letter: /[Tt]he Decimal: /[0 -9]*/ Alphanumeric: /[a-z. A-Z 0 -9]*/ Choice between two strings: /(da|De). */ Numeric: /[+-]? [0 -9]+/ Any Boolean expression: $4>90 or $4>$5 Note: Some utilities require (, ) and | if you use ()| regular expression characters
Built in Variables • • NR: Total number of records NF: Total number of fields FILENAME: The current input file FS: Field separator character RS: Record separator character OFS: Output field separator character ORS: Output record separator character OFMT: The default printf output format
Arrays and control structures • Indexed and associative arrays – By index: months[3] = "March"; – Associative: debts["Kim"] = 1000; – Note: arrays index from one, not zero • • Counter Controled: for (i=1, i<100; i++) data[i] = i; Iterator: for (i in my. Array) print i, names[i]; Pre test: i=0; while (i<20) data[i] = i++; Condition: if (i==1) print debts["Kim"] else print debts["Joe"]; print (i==1)? debts["Kim"] : debts["Joe"]; • Unconditional control statements – – break: jump out of a loop continue: next iteration next: get next line of input exit: exit the AWK program
Built-in functions • • • Square root: print sqrt(3. 6) Integer portion: print int(3. 2) Substring: print substr("abcde", 3, 2); Split: letters = split("a; b; c; d; e", "; "); Position: print index("gorbachev", "bach"); Note: if a substring doesn't exist, 0 returned Note: Strings index from one, not zero
printf • printf(<template>, <arguments>); – printf applies the template to the arguments – Formats are specified in the templates %d for integer output %o for octal %x for hexadecimal %s for string %e for exponential format %f for floating point format – Greater control %5. 2 f means 5 spaces wide, print two digits %-8. 4 s means left justify, 8 wide, print 4 characters %08 s means output leading zeroes, print 8 characters
Escape Characters • • New line: n Carriage return: r Backspace: b Horizontal tab: t Form feed: f A quote: " A backslash: \
AWK redirection and pipes • Create a file with the first field >awk '{print $1 >> "file" }’ • Pipe output to another utility >ls –l | awk '{print $8}' | tr '[a-z]' '[ A-Z]' Pipe to a utility to translate from lower to upper case • Sort the grades file and print the first field >sort +4 n grades | awk '{print $1}' • list. txt files < 2000 bytes, print sorted descending >ls –l | grep '. txt$' | awk '$5 < 2000 {print $9, $5}' | sort –nr +1
More Examples • Print Bush's grades >awk '/Bush/{print $3, $4}' grades • Print first name, last name, and quiz 3 grade for everyone who got more than a 90 on quiz 1 and 2 >awk '{if ($4>90 && $5>90) print $3, $2, $6}' grades >awk '$4>90 && $5>90 {print $3, $2, $6}' • Print username for user with userid 502 >awk –F: '{if ($3==502) print $1}' >awk –F: '$3==502 {print $1}'
- Slides: 24