AWK awk text processing languge awk Created for

  • Slides: 31
Download presentation
AWK

AWK

awk text processing languge

awk text processing languge

awk • Created for Unix by Aho, Weinberger and Kernighan • Basicaly an: ▫

awk • Created for Unix by Aho, Weinberger and Kernighan • Basicaly an: ▫ interpreted ▫ text processing ▫ programming language • Updated versions ▫ NAWK �New awk ▫ GAWK �Free Software Foundation’s version

awk Basics • Basic form: ▫ awk options 'selection criteria {action}' file(s) • Can

awk Basics • Basic form: ▫ awk options 'selection criteria {action}' file(s) • Can use regular expressions • Files read one line at a time with contents as fields • Fields are numbered ($1, $2, etc…) ▫ Entire line is $0 • Can run standalone • Can run as a program • Uses a blank as the default separator

-f Option (stored awk programs) • awk programs can be stored in a file

-f Option (stored awk programs) • awk programs can be stored in a file • awk –f awkfile datafile ▫ -f filename is the awk program ▫ datafile contains the data

Example • Find the TAs in the personnel file ▫ The file is blank

Example • Find the TAs in the personnel file ▫ The file is blank separated �-F defines the delimiter �Use “ “ to escape the blank (a blank after the ) ▫ Note: the blank is the default seperator anyway ▫ Title is in the 3 rd field # cat personnel. data Tony Kombol Lecturer 800111222 704 -687 -1111 Jinyue Xia TA 800111333 704 -687 -2222 Hadi Hashemi TA 800111444 704 -687 -3333 # # awk -F '$3 == "TA" { print }' personnel. data Jinyue Xia TA 800111333 704 -687 -2222 Hadi Hashemi TA 800111444 704 -687 -3333 #

example • To run an awk program ▫ personnel. data has the data ▫

example • To run an awk program ▫ personnel. data has the data ▫ findta. awk is the code �Looks for TA (3 rd parm) �Prints first name and telephone number (1 st and 5 th parms) ▫ Note: what small formatting problem is here? # awk -F -f findta. awk personnel. data TAs Jinyue 704 -687 -2222 Hadi 704 -687 -3333 Done # cat personnel. data Tony Kombol Lecturer 800111222 704 -687 -1111 Jinyue Xia TA 800111333 704 -687 -2222 Hadi Hashemi TA 800111444 704 -687 -3333 # cat findta. awk BEGIN { print "TAs"; } $3 == "TA" {print $1 $5} END { print "Done" }

print and printf • Output goes to std out ▫ can be redirected with

print and printf • Output goes to std out ▫ can be redirected with > or | �redirected name must be in quotes: �# print $2, $1 | "sort" ▫ the output of the print goes to the sort routine • print is unformatted • printf allows formatting ▫ %s – string �%-20 s � 20 char spaces, justified (-) ▫ %d – integer �%8 d �set aside 8 spaces for the number ▫ %f – floating point �%4. 8 f �Set aside 4 chars to the left of the decimal point and 8 to the right ▫ printf needs n to start new line

Number processing • AWK supports basic computation ▫ ▫ ▫ + - addition -

Number processing • AWK supports basic computation ▫ ▫ ▫ + - addition - - subtraction * - multiplication / - division % - modulus ^ - exponentiation • Also supports: ▫ ▫ ▫ ++ - add one to itself (post and pre fix) += - add and assign to self -- - subtract one from self (post and pre fix) -= - subtract from self *= - multiply self /= - divide self

Variables and Expressions • awk is loosely typed • do not need to declare

Variables and Expressions • awk is loosely typed • do not need to declare variables ▫ x = 5 • do not need $ to use variables like sed or bash ▫ print x • strings are double quoted ▫ x = "This is a string" • no string concatenater, done by context ▫ x = "string 1"; y = "string 2" print x y �Space is required • some conversions done automatically ▫ x = "56"; y = 43; z = "abc" print x y # gives 5643 print x + y # gives 99 print y + z # gives 43 y converted to string + converts x to integer + converts z to integer 0

Comparison and Logical Operators • awk supports string and numeric comparisons ▫ == is

Comparison and Logical Operators • awk supports string and numeric comparisons ▫ == is the equality operator �= is for assignment ▫ < and > can be used on strings �Beware of conversions when dealing with strings that consist of numbers ▫ ~ is used for regular expressions �$2 ~ /[dh]og/ �parameter 2 matches hog or dog

Comparison and Logical Operators • awk supports boolean operations ▫ && - and ▫

Comparison and Logical Operators • awk supports boolean operations ▫ && - and ▫ || - or ▫ ! - not

simple comparison • Field 6 is number of years with organization ▫ Find those

simple comparison • Field 6 is number of years with organization ▫ Find those with more than 5 years # awk '$6 > 5 { print $2 ", " $1 ": " $6}' personnelyears. data Kombol, Tony: 6 Flintstone, Fred: 10 # # cat personnelyears. data Tony Kombol Lecturer 800111222 704 -687 -1111 6 Jinyue Xia TA 800111333 704 -687 -2222 3 Hadi Hashemi TA 800111444 704 -687 -3333 1 Fred Flintstone RA 800123321 704 -687 -1212 10 Barney Rubble URA 800112233 704 -687 -3344 4 #

Regular Expression comparison example • Find the TAs and RAs including the URAs #

Regular Expression comparison example • Find the TAs and RAs including the URAs # awk '$3 ~ /[RT]A/ {print $1 " " $2 " " $5}' personnel. data Jinyue Xia 704 -687 -2222 Hadi Hashemi 704 -687 -3333 Fred Flintstone 704 -687 -1212 Barney Rubble 704 -687 -3344 # # cat personnel. data Tony Kombol Lecturer 800111222 704 -687 -1111 Jinyue Xia TA 800111333 704 -687 -2222 Hadi Hashemi TA 800111444 704 -687 -3333 Fred Flintstone RA 800123321 704 -687 -1212 Barney Rubble URA 800112233 704 -687 -3344

BEGIN and END Sections • BEGIN and END ▫ Allows for some pre and

BEGIN and END Sections • BEGIN and END ▫ Allows for some pre and post processing ▫ Both are optional • General format: ▫ BEGIN { action } END { action } ▫ BEGIN's actions are done before the processing of the datafile begins �Good for headers, setup, etc. ▫ END's actions are done after the processing of the datafile ends �Good for post processing, notes, etc.

another regular expression • This is a more complex check using a file for

another regular expression • This is a more complex check using a file for the awk program ▫ Check to see the ID is 800…… �That is 800 followed by 6 characters # cat findbadid. awk BEGIN { print "List of bad IDs follows"; } $4 !~ /^800. . . / { print $1 " " $2 " has a bad id: " $4}; END { print "End of list"; } # # awk -f findbadid. awk personnelbad. data List of bad IDs follows Bad Id has a bad id: 809123456 End of list # cat personnelbad. data Tony Kombol Lecturer 800111222 704 -687 -1111 6 Jinyue Xia TA 800111333 704 -687 -2222 3 Hadi Hashemi TA 800111444 704 -687 -3333 1 Fred Flintstone RA 800123321 704 -687 -1212 10 Barney Rubble URA 800112233 704 -687 -3344 4 Bad Id LX 809123456 704 -687 -8890 0

awk file example # awk -F: -f ckgrades. awk grades. data Listing Bs Tara

awk file example # awk -F: -f ckgrades. awk grades. data Listing Bs Tara Boomdea: 85: B Zorbax Bottlewit: 88: B Done # # cat BEGIN print } $3 == print END { print } # ckgrades. awk { "Listing Bsn" "B" { $0 } "n. Done" # cat grades. data Fred Ziffle: 99: A Arnold Ziffle: 55: F Tara Boomdea: 85: B Neo: 100: A Buffy Summers: 72: C Sheldon Cooper: 67: D Zorbon Prentwist: 88 : B Zorbax Bottlewit: 88: B Bad Grade: 33: A Note: ": B" does not get matched

Positional Parameters • Parameters are usually used as the fields of each line •

Positional Parameters • Parameters are usually used as the fields of each line • A parameter can be passed to the awk program ▫ Used with a shell program ▫ Must be in quotes in the program �e. g. �Instead of ▫ $4 > 12 ▫ 4 th parm in line is > 12 ▫ $4 > '$2' ▫ 4 th parm in line is > 2 nd parm passed to the program: ▫ prog. awk 50 82

Arrays • awk supports arrays ▫ arrays do not need to be "declared" �"declared"

Arrays • awk supports arrays ▫ arrays do not need to be "declared" �"declared" the minute they are used • Arrays are associative ▫ index can be �numeric �alphabetic ▫ thisday["Tue"] = "Tuesday"; thisday[2] = "Tuesday"; �above are two array elements for the array thisday �each reference a separate string �printf("thisday["Tue"] is %s", thisday["Tue"]) ; printf("thisday[2] is %s", thisday[2]) ; ▫ Both will print "Tuesday" for the array referenced

Arrays • ENVIRON[ ] ▫ an assosciative array containing all the environmental variables #

Arrays • ENVIRON[ ] ▫ an assosciative array containing all the environmental variables # awk 'BEGIN{for (env in ENVIRON)print env "=" ENVIRON[env]}' SSH_CLIENT=10. 23. 161. 139 59365 22 HOME=/home/tkombol TERM=xterm LESSOPEN=| /usr/bin/lesspipe %s SHELL=/bin/bash USER=tkombol _=/usr/bin/awk SHLVL=1 PWD=/home/tkombol SSH_CONNECTION=10. 23. 161. 139 59365 152. 15. 95. 103 22 LANG=en_US. UTF-8 MAIL=/var/mail/tkombol LS_COLORS=no=00: fi=00: di=01; 34: ln=01; 36: pi=40; 33: so=01; 35: do=01; 35: bd=40; 33; 01: cd=40; 33; 01 HISTCONTROL=ignoredups PATH=/usr/local/bin: /usr/bin: /usr/games LESSCLOSE=/usr/bin/lesspipe %s %s LOGNAME=tkombol SSH_TTY=/dev/pts/2 #

Built-in Variables • awk has a set of built-in variables ▫ Some can be

Built-in Variables • awk has a set of built-in variables ▫ Some can be overridden Built-In Variables Variable Function Default NR Cumulative # of lines read - FS Input Field Separator space OFS Output Field Separator space OFMT Default FP format %. 6 f RS Record separator newline NF Number of fields in current line - FILENAME Current input file - ARGC Number of arguments in command line - ARGV Array containing list of arguments - ENVIRON Assoc. array of all environment variables -

Functions • awk has several built-in functions ▫ () are optional if no parms

Functions • awk has several built-in functions ▫ () are optional if no parms �encouraged to use ▫ Arithmetic functions ▫ String functions

Arithmetic Functions • int(x) • sqrt(x)

Arithmetic Functions • int(x) • sqrt(x)

String Functions • length() • • ▫ length of complete line length(x) ▫ length

String Functions • length() • • ▫ length of complete line length(x) ▫ length of x tolower(s) ▫ returns s as lower case toupper(s) ▫ returns s as upper case substr(str, m) ▫ returns string starting at m to end of string substr(str, m, n) ▫ returns string starting at m for n characters index(s 1, s 2) ▫ finds the position of s 2 inside s 2 split(str, arr, ch) ▫ splits str int an array, the delimiter is ch system("cmd") ▫ exectutes a system (Linux) command returns exit status

if • Syntax: ▫ if (cond true) { statements } else { statements }

if • Syntax: ▫ if (cond true) { statements } else { statements } ▫ Notes: �else is optional �{} not needed for single statements

for • Syntax form 1: ▫ for ( startval ; condition ; control )

for • Syntax form 1: ▫ for ( startval ; condition ; control ) statement �C like in form ▫ Example: �for ( k=1 ; k<9 ; k++ ) print k • Syntax form 2: ▫ for ( var in array ) statement �Will scan every var in the array �Great for associative array �Non numeric indices �Gaps in array �See ENVIRON example in previous slide

While • Syntax: ▫ while (cond is true) { statement(s) }

While • Syntax: ▫ while (cond is true) { statement(s) }

continue and break • Continue and break can be used to stop all loops

continue and break • Continue and break can be used to stop all loops ▫ for ▫ while • break ▫ stops the loop • continue ▫ stops processing statements in this loop ▫ continues to next iteration

Resources • Awk - A Tutorial and Introduction - by Bruce Barnett ▫ http:

Resources • Awk - A Tutorial and Introduction - by Bruce Barnett ▫ http: //www. grymoire. com/Unix/Awk. html • Awk Tutorial - Main Page ▫ http: //robert. wsi. edu. pl/awk/

Which is not a “scripting language: 69. 70% 6. 06% 0. 00% 12. 12%

Which is not a “scripting language: 69. 70% 6. 06% 0. 00% 12. 12% 0. 00% 9. 09% A. Auk B. Awk C. Perl D. Pearl E. Bash F. Bam

Summary • awk is a "primative" scripting language • good for processing text files

Summary • awk is a "primative" scripting language • good for processing text files ▫ filtering • perl is a more modern replacement ▫ "religious war" over which is better • if you understand awk it will be a good basis to understant perl