Programmable Text Processing with awk Lecturer Prof Andrzej
Programmable Text Processing with awk Lecturer: Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun. edu Phone: 818 -677 -4954 “UNIX for Programmers and Users” Third Edition, Prentice-Hall, GRAHAM GLASS, KING ABLES Prof. Andrzej (AJ) Bieszczad Slides partially adapted from Kumoh National University of Technology (Korea) and NYU Email: andrzej@csun. edu Phone:
Programmable Text Processing with awk • The awk utility scans one or more files and an action on all of the lines that match a particular condition. • The actions and conditions are described by an awk program and range from the very simple to the complex. • awk got its name from the combined first letters of its authors’ surnames: Aho, Weinberger, and Kernighan. Aho Weinberger Kernighan • It borrows its control structures and expression syntax from the language C. Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun. edu Phone: 818 -677 -4954 2
Programmable Text Processing with awk • awk's purpose: A general purpose programmable filter that handles text (strings) as easily as numbers – this makes awk one of the most powerful of the Unix utilities • A programming language for handling common data manipulation tasks with only a few lines of code • awk is a pattern-action language • awk processes fields • The language looks a little like C but automatically handles input, field splitting, initialization, and memory management – Built-in string and number data types – No variable type declarations • awk is a great prototyping language – start with a few lines and keep adding until it does what you want • awk gets it’s input from – files – redirection and pipes – directly from standard input • nawk (new awk) is the new standard for awk – Designed to facilitate large awk programs – gawk is a free nawk clone from GNU Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun. edu Phone: 818 -677 -4954 3
Programmable Text Processing with awk Program • An awk program is a list of one or more commands of the form: [ pattern ] [ { action } ] • For example: BEGIN { print "List of html files: " } /. html$/ { print } END { print "There you go!" } ---> “/” then “. ” then “html” then “$” • action is performed on every line that matches pattern (or condition in other words). • If pattern is not provided, action is performed on every line. • If action is not provided, then all matching lines are simply sent to standard output. • Since patterns and actions are optional, actions must be enclosed in braces to distinguish them from pattern. • The statements in an awk program may be indented and formatted using spaces, tabs, and new lines. Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun. edu Phone: 818 -677 -4954 4
Programmable Text Processing with awk: Patterns and Actions • Search a set of files for patterns. • Perform specified actions upon lines or fields that contain instances of patterns. • Does not alter input files. • Process one input line at a time • Every program statement has to have a pattern or an action or both • Default pattern is to match all lines • Default action is to print current record • Patterns are simply listed; actions are enclosed in { } • awk scans a sequence of input lines, or records, one by one, searching for lines that match the pattern – meaning of match depends on the pattern Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun. edu Phone: 818 -677 -4954 5
Programmable Text Processing with awk: Patterns • Selector that determines whether action is to be executed pattern can be: • the special token BEGIN or END • extended regular expressions (enclosed with //) • arithmetic relation operators • string-valued expressions • arbitrary combination of the above: /CSUN/ matches if the string “CSUN” is in the record x > 0 matches if the condition is true /CSUN/ && (name == "UNIX Tools") Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun. edu Phone: 818 -677 -4954 6
Programmable Text Processing with awk Special awk Patterns: BEGIN, END • BEGIN and END provide a way to gain control before and after processing, for initialization and wrap-up. • BEGIN: actions are performed before the first input line is read. • END: actions are done after the last input line has been processed. BEGIN { print "List of html files: " } /. html$/ { print } END { print "There you go!" } Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun. edu Phone: 818 -677 -4954 7
Programmable Text Processing with awk: Actions • action is a list of one or more of the following kinds of C-like statements terminated by semicolons: if ( conditional ) statement [ else statement ] while ( conditional ) statement for ( expression; conditional; expression ) statement break continue variable = expression print [ list of expressions ] [>expression] printf format [, list of expressions ] [>expression] next(skips the remaining patterns on the current line of input) exit(skips the rest of the current line) { list of statements } • action may include arithmetic and string expressions and assignments and multiple output streams. Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun. edu Phone: 818 -677 -4954 8
Programmable Text Processing with awk: An Example $ ls | awk ' BEGIN { print "List of html files: " } /. html$/ { print } END { print "There you go!" } ‘ List of html files: index. html as 1. html as 2. html There you go! $_ Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun. edu Phone: 818 -677 -4954 9
Programmable Text Processing with awk: Variables • awk scripts can define and use variables BEGIN { sum = 0 } { sum ++ } END { print sum } • Some variables are predefined: • NR - Number of records processed • NF - Number of fields in current record • FILENAME - name of current input file • FS - Field separator, space or TAB by default • OFS - Output field separator, space by default • ARGC/ARGV - Argument Count, Argument Value array – Used to get arguments from the command line Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun. edu Phone: 818 -677 -4954 10
Programmable Text Processing with awk: Records • Default record separator is newline – by default, awk processes its input a line at a time. • Could be any other regular expression. • Special variable RS: record separator – can be changed in BEGIN action • Special variable NR is the variable whose value is the number of the current record. Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun. edu Phone: 818 -677 -4954 11
Programmable Text Processing with awk: Fields • Each input line is split into fields. • Special variable FS: field separator: default is whitespace (1 or more spaces or tabs) awk –Fc – sets FS to the character c – can also be changed in BEGIN • $0 is the entire line • $1 is the first field, $2 is the second field, …. , $NF is the last field • Only fields begin with $, variables are unadorned Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun. edu Phone: 818 -677 -4954 12
Programmable Text Processing with awk: Simple Output From AWK • Printing Every Line – If an action has no pattern, the action is performed to all input lines { print } will print all input lines to standard out { print $0 } will do the same thing • Printing Certain Fields – multiple items can be printed on the same output line with a single print statement { print $1, $3 } – expressions separated by a comma are, by default, separated by a single space when output Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun. edu Phone: 818 -677 -4954 13
Programmable Text Processing with awk: Output (continued) • Special variable NF: number of fields – Any valid expression can be used after a $ to indicate the contents of a particular field – One built-in expression is NF: number of fields { print NF, $1, $NF } – will print the number of fields, the first field, and the last field in the current record { print $(NF-2) } – prints the third to last field • Computing and Printing – You can also do computations on the field values and include the results in your output { print $1, $2 * $3 } Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun. edu Phone: 818 -677 -4954 14
Programmable Text Processing with awk: Output (continued) • Printing Line Numbers – The built-in variable NR can be used to print line numbers { print NR, $0 } – will print each line prefixed with its line number • Putting Text in the Output – you can also add other text to the output besides what is in the current record { print "total pay for", $1, "is", $2 * $3 } – Note that the inserted text needs to be surrounded by double quotes Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun. edu Phone: 818 -677 -4954 15
Programmable Text Processing with awk: Fancier Output • Lining Up Fields – like C, Awk has a printf function for producing formatted output – printf has the form: printf( format, val 1, val 2, val 3, … ) { printf(“total pay for %s is $%. 2 fn”, $1, $2 * $3) } – when using printf, formatting is under your control so no automatic spaces or newlines are provided by awk. You have to insert them yourself. { printf(“%-8 s %6. 2 fn”, $1, $2 * $3 ) } Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun. edu Phone: 818 -677 -4954 16
Programmable Text Processing with awk: Selection • Awk patterns are good for selecting specific lines from the input for further processing • Selection by Comparison $2 >= 5 { print } • Selection by Computation $2 * $3 > 50 { printf(“%6. 2 f for %sn”, $2 * $3, $1) } • Selection by Text Content $1 == “CSUN" /CSUN/ • Combinations of Patterns $2 >= 4 || $3 >= 20 • Selection by Line Number NR >= 10 && NR <= 20 Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun. edu Phone: 818 -677 -4954 17
Programmable Text Processing with awk: Arithmetic and Variables • awk variables take on numeric (floating point) or string values according to context. • User-defined variables are unadorned (they need not be declared). • By default, user-defined variables are initialized to the null string which has numerical value 0. • awk Operators: = assignment operator; sets a variable equal to a value or string == equality operator; returns TRUE is both sides are equal != inverse equality operator && logical AND || logical OR ! logical NOT <, >, <=, >= relational operators +, -, /, *, %, ^ arithmetic String concatenation Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun. edu Phone: 818 -677 -4954 18
Programmable Text Processing with awk: Arithmetic and Variables Examples • Counting is easy to do with Awk $3 > 15 { emp = emp + 1} # work hours are in the third field END { print emp, “employees worked more than 15 hrs”} • Computing sums and averages is also simple { pay = pay + $2 * $3 } END { print NR, “employees” print “total pay is”, pay print “average pay is”, pay/NR } # $2 pay per hour, $3 - hours Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun. edu Phone: 818 -677 -4954 19
Programmable Text Processing with awk: Handling Text • One major advantage of awk is its ability to handle strings as easily as many languages handle numbers • awk variables can hold strings of characters as well as numbers, and Awk conveniently translates back and forth as needed • This program finds the employee who is paid the most per hour: # Fields: employee, payrate $2 > maxrate { maxrate = $2; maxemp = $1 } END { print “highest hourly rate: ”, maxrate, “for”, maxemp } Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun. edu Phone: 818 -677 -4954 20
Programmable Text Processing with awk: String Manipulation • String Concatenation – new strings can be created by combining old ones { names = names $1 " " } END { print names } • Printing the Last Input Line – although NR retains its value after the last input line has been read, $0 does not { last = $0 } END { print last } Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun. edu Phone: 818 -677 -4954 21
Programmable Text Processing with awk: Built-In Functions • awk contains a number of built-in functions. • Arithmetic – sin, cos, atan, exp, int, log, rand, sqrt • String – length, substitution, find substrings, split strings • Output – print, printf, print and printf to file • Special – system - executes a Unix command • e. g. , system(“clear”) to clear the screen • Note double quotes around the Unix command – exit - stop reading input and go immediately to the END pattern-action pair if it exists, otherwise exit the script Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun. edu Phone: 818 -677 -4954 22
Programmable Text Processing with awk: Built-in Functions • Example: • Counting lines, words, and characters using length (a poor man’s wc): { nc = nc + length($0) + 1 nw = nw + NF } END { print NR, "lines, ", nw, "words, ", nc, "characters" } • substr(s, m, n) produces the substring of s that begins at position m and is at most n characters long. Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun. edu Phone: 818 -677 -4954 23
Programmable Text Processing with awk: Control Flow Statements • awk provides several control flow statements for making decisions and writing loops • if-then-else $2 > 6 { n = n + 1; pay = pay + $2 * $3 } END { if (n > 0) print n, "employees, total pay is", pay, "average pay is", pay/n else print "no employees are paid more than $6/hour" } Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun. edu Phone: 818 -677 -4954 24
Programmable Text Processing with awk: Loops • while # interest 1 - compute compound interest # input: amount, rate, years # output: compound value at end of each year {i=1 while (i <= $3) { printf(“t%. 2 fn”, $1 * (1 + $2) ^ i) i=i+1 } } • do-while do { statement 1 } while (expression) • for # interest 2 - compute compound interest # input: amount, rate, years # output: compound value at end of each year { for (i = 1; i <= $3; i = i + 1) printf("t%. 2 fn", $1 * (1 + $2) ^ i) } Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun. edu Phone: 818 -677 -4954 25
Programmable Text Processing with awk: Arrays • Array elements are not declared • Array subscripts can have any value: – numbers – strings! (associative arrays) arr[3]="value" grade["Korn"]=40. 3 • Example # reverse - print input in reverse order by line { line[NR] = $0 } # remember each line END { for (i=NR; (i > 0); i=i-1) { print line[i] } } Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun. edu Phone: 818 -677 -4954 26
Programmable Text Processing with awk: Examples • In the following example, we run a simple awk program on the text file “float” to insert the number of fields into each line: $ cat float --> look at the original file. Wish I was floating in blue across the sky, My imagination is strong, And I often visit the days When everything seemed so clear. Now I wonder what I’m doing here at all… $ awk `{ print NF, $0 }` float --> execute the command. 9 Wish I was floating in blue across the sky, 4 My imagination is strong, 6 And I often visit the days 5 When everything seemed so clear. 9 Now I wonder what I’m doing here at all… $_ Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun. edu Phone: 818 -677 -4954 27
Programmable Text Processing with awk: Examples • We run a program that displayed the first, third, and last fields of every line: $ cat awk 2 --> look at the awk script. BEGIN { print “Start of file: ”, FILENAME } { print $1 $3 $NF } --> print first, third and last fields. END { print “End of file” } $ awk -f awk 2 float --> execute the script. Start of file: float Wishwassky, Myisstrong, Andoftendays Whenseemdedclear. Nowwonderall… End of file $_ Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun. edu Phone: 818 -677 -4954 28
Programmable Text Processing with awk: Examples • In the next example, we run a program that displayed the first, third, and last fields of lines 2 and 3 of “float”: $ cat awk 3 --> look at the awk script. NR > 1 && NR < 4 { print NR, $1, $3, $NF } $ awk -f awk 3 float --> execute the script. 2 My is strong, 3 And often days $_ Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun. edu Phone: 818 -677 -4954 29
Programmable Text Processing with awk: Examples • A variable’s initial value is a null string or zero, depending on how you use it. • In the next example, the program counts the number of lines and words in a file as it echoed the lines to standard output: $ cat awk 4 --> look at the awk script. BEGIN { print “Scanning file” } { printf “line %d: %sn”, NR, $0; line. Count++; word. Count += NF; } END {printf “lines = %d, words=%dn”, line. Count, word. Count} Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun. edu Phone: 818 -677 -4954 30
Programmable Text Processing with awk: Examples $ awk -f awk 4 float --> exeute the script. Scanning file line 1 : Wish I was floating in blue across the sky, line 2 : My imagination is strong, line 3 : And I often visit the days line 4 : When everything seemed so clear. line 5 : Now I wonder what I’m doing here at all… lines = 5, words = 33 $_ Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun. edu Phone: 818 -677 -4954 31
Programmable Text Processing with awk: Examples • In the following example, we print the fields in each line in reverse order: $ cat awk 5 { for ( i=NF; i>=1; i-- ) printf “%s ”, $i; printf “n”; } $ awk -f awk 5 float sky, the across blue in floating was I wish strong, is imagination My days the visit often I And clear, so seemed everything When all… at here doing I’m what wonder I Now $_ --> look at the awk script. --> execute the script. Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun. edu Phone: 818 -677 -4954 32
Programmable Text Processing with awk: Examples • In the next example, we display all of the lines that contained a t followed by an e, with any number of characters in between. $ cat awk 6 --> look at the script. /t. *e/ { print $0 } $ awk -f awk 6 float --> execute the script. Wish I was floating in blue across the sky, And I often visit the days When everything seemed so clear. Now I wonder what I’m doing here at all… $_ Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun. edu Phone: 818 -677 -4954 33
Programmable Text Processing with awk: Examples • A condition may be two expressions separated by a comma. In this case, awk performs action on every line from the first line that matches the first condition to the next line that satisfies the secondition: $ cat awk 7 /strong/, /clear/ { print $0 } $ awk -f awk 7 float My imagination is strong, And I often visit the days When everything seemed so clear. $_ --> look at the awk script. --> execute the script. --> first line of the range --> last line of the range Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun. edu Phone: 818 -677 -4954 34
Programmable Text Processing with awk: Examples • In the next example, we process a file whose fields are separated by colons: $ cat awk 3 --> look at the awk script. NR > 1 && NR < 4 { print $1, $3, $NF } $ cat float 2 --> look at the input file. Wish: I: was: floating: in: blue: across: the: sky, My: imagination: is: strong, And: I: often: visit: the: days When: I: wonder: what: I’m: doing: here: at: all… Now: I: wonder: what: I’m: doing: here: at: all… $ awk -F: -f awk 3 float 3 --> execute the script. My is strong, And often days $_ Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun. edu Phone: 818 -677 -4954 35
Programmable Text Processing with awk: Examples • Here’s an example of the use of some built-in functions: $ cat test --> look at the input file. 1. 1 a 2. 2 at 3. 3 eat 4. 4 beat $ cat awk 8 --> look at the awk script. { printf “$1 = %g ”, $1 printf “exp = %. 2 g “, exp($1); printf “log = %. 2 g “, log($1); printf “sqrt = %. 2 g “, sqrt($1); printf “int = %d “, int($1); printf “substr( %s, 1, 2) = %s n”, $2, substr( $2, 1, 2); } $ awk -f awk 8 test --> execute the script. $1=1. 1 exp=3 log=0. 095 sqrt=1 int =1 substr(a, 1, 2)=a $1=2. 2 exp=9 log=0. 79 sqrt=1. 5 int=2 substr(at, 1, 2)=at $1=3. 3 exp=27 log=1. 2 sqrt=1. 8 int=3 substr(eat, 1, 2)=ea $1=4. 4 exp=81 log=1. 5 sqrt=2. 1 int=4 substr(beat, 1, 2)=be $_ Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun. edu Phone: 818 -677 -4954 36
Programmable Text Processing with awk challenge Prof. Andrzej (AJ) Bieszczad Email: andrzej@csun. edu Phone:
- Slides: 37