Unix Programming working with files CSRU 3130 Spring

  • Slides: 53
Download presentation
Unix Programming: working with files CSRU 3130, Spring 2008 Ellen Zhang Spring 2008 CSRU

Unix Programming: working with files CSRU 3130, Spring 2008 Ellen Zhang Spring 2008 CSRU 3130 Unix Programming 1

Last Class • Programming with standard I/O – What’s inside a file ? –

Last Class • Programming with standard I/O – What’s inside a file ? – ASCII code – getchar(), putchar() – printf Spring 2008 CSRU 3130 Unix Programming 2

vis program #include <stdio. h> #include <ctype. h> main() { int c; } Character

vis program #include <stdio. h> #include <ctype. h> main() { int c; } Character test macros: in /usr/include/ctype. h while ((c=getchar()) !=EOF) if (isascii(c) && (isprint(c) || c==‘n’ || c==‘t’ || c==‘ ‘)) putchar (c); else printf (“\%03 o”, c); exit(0); Spring 2008 CSRU 3130 Unix Programming 3

Escape sequence in C ' Single quote " Double quote \ Backslash nnn Octal

Escape sequence in C ' Single quote " Double quote \ Backslash nnn Octal number (nnn), xnnn Hexadecimal number (nnn) Null character (really just the octal number zero) a Audible bell b Backspace f Formfeed In Unix, lines are separated by NL (n) In Windows, lines are separated by n Newline (NL, LF) CR and NL. r Carriage return (CR) t Horizontal tab Spring 2008 CSRU 3130 Unix Programming 4

A program: genfile. c #include <stdio. h> #include <ctype. h> main() { int c;

A program: genfile. c #include <stdio. h> #include <ctype. h> main() { int c; c=0; putchar (c); putchar ('07'); c='a'; putchar (c); putchar ('a'); putchar ('b'); putchar ('c'); putchar ('n'); putchar ('11'); putchar ('d'); putchar ('e'); putchar ('12'); } //tab, t Online resource for ASCII: here Spring 2008 CSRU 3130 Unix Programming 5

Now try vis [zhang@storm vis]$. /genfile abc de [zhang@storm vis]$. /genfile | od -cb

Now try vis [zhang@storm vis]$. /genfile abc de [zhang@storm vis]$. /genfile | od -cb 0000000 a a a b c n t d e n 000 007 141 142 143 012 011 144 145 012 0000013 [zhang@storm vis]$. /genfile |. /vis 0007 abc de Spring 2008 CSRU 3130 Unix Programming 6

Today • • • Processing command line options/arguments More on printf, scanf File I/O

Today • • • Processing command line options/arguments More on printf, scanf File I/O A script a day: overwrite A command a day: find Spring 2008 CSRU 3130 Unix Programming 7

Adding options to vis • vis –s : strip away non-printable character • Command-line

Adding options to vis • vis –s : strip away non-printable character • Command-line arguments are available to main() int main(int argc, char *argv[]) – argc is the number of command line parameters; – parameters are stored in argv[0], . . . , argv[argc-1]. Note: argv[0] is the command name itself. – Equivalently, • int main(int argc, char ** argv) Spring 2008 CSRU 3130 Unix Programming 8

Interpreting complex declaration char *argv[]; identifier • brackets and parentheses (that is, modifiers to

Interpreting complex declaration char *argv[]; identifier • brackets and parentheses (that is, modifiers to the right of the identifier) take precedence over asterisks (that is, modifiers to the left of the identifier). • Read from inside out: – – argv is an array of pointers to character … (apply type specifier, i. e. , char, last) • Can you draw memory diagram for argv ? Spring 2008 CSRU 3130 Unix Programming 9

Interpreting complex declaration char (*tmp)[]; identifier • brackets and parentheses (that is, modifiers to

Interpreting complex declaration char (*tmp)[]; identifier • brackets and parentheses (that is, modifiers to the right of the identifier) take precedence over asterisks (that is, modifiers to the left of the identifier). • Parenthesis can be used to override default order • Read from inside out, go to right first, then left – – Spring 2008 tmp is (start with the identifier) pointer to (interpret what’s in the parenthesis first) array of (go to right) character … (apply type specifier, i. e. , char, last) CSRU 3130 Unix Programming 10

Example of processing options #include <stdio. h> #include <ctype. h> strcmp is part of

Example of processing options #include <stdio. h> #include <ctype. h> strcmp is part of standard I/O library; it compares two strings int main(int argc, char *argv[]) { and return 0 if two strings are int c, strip = 0; identical. if (argc > 1 && strcmp(argv[1], "-s") == 0) strip = 1; while ((c = getchar()) != EOF) if (isprint(c) || isspace(c) || c==‘n’ || c==‘t’ || c==‘ ‘)) ) putchar(c); else if (!strip) printf("\%03 o", c); return 0; } Spring 2008 CSRU 3130 Unix Programming 11

Today • • • Processing command line options/arguments More on printf, scanf File I/O

Today • • • Processing command line options/arguments More on printf, scanf File I/O A script a day: overwrite A command a day: find Spring 2008 CSRU 3130 Unix Programming 12

Formatting output with printf int printf (const char *, … ); • Write strings,

Formatting output with printf int printf (const char *, … ); • Write strings, integers, doubles, etc to standard output, performat conversion • … means “variable number of arguments”, the first argument is required (a string). • Given a simple string, printf just prints the string (to standard output). printf (“Ithavettabsn”); char s[100]; strcpy(s, “printf is fun!an”); printf (s); Spring 2008 CSRU 3130 Unix Programming 13

Formatting output with printf • You can tell printf to embed some values in

Formatting output with printf • You can tell printf to embed some values in the string – these values are determined at run-time, by using formatting tags embedded in the first string argument • Example: – – printf (“here is an integer: %dn”, i); printf (“%d + %d = %dn”, x, y, x+y); printf (“reverse %s and we get %sn”, str, reverse (str)); printf (“sqrt of %n is %lf n”, x, sqrt(x)); Spring 2008 CSRU 3130 Unix Programming 14

Printf formatting tag (1) • printf : formatting tag – %[flags][width][. precision][length]specifier • Specifiers:

Printf formatting tag (1) • printf : formatting tag – %[flags][width][. precision][length]specifier • Specifiers: – d treat the corresponding parameter as a signed integer – u means unsigned integer – x means print as hexadecimal – s means “treat it as a string” – c is for character (char) – f is for floating point numbers Spring 2008 CSRU 3130 Unix Programming 15

printf is dumb • printf will treat the corresponding parameter as the specifier suggests,

printf is dumb • printf will treat the corresponding parameter as the specifier suggests, even if the parameter is not of the given type – e. g. , %d is replaced by the value of the parameter when treated as an integer, even if the parameter is not an integer variable – printf (“print an int %dn”, “Hi Dave”); • print an int 134513980 – printf ("print an int %dn", 12. 3); • print an int -1717986918 Spring 2008 CSRU 3130 Unix Programming 16

Fun with printf • char * s = “Hi Dave”; • printf (“the string

Fun with printf • char * s = “Hi Dave”; • printf (“the string ”%s” is %d characters longn”, s, strlen(s)); – the string “Hi Dave” is 7 characters long • int x=10; • printf (“x=%d is %o in octal and %x in hexadecimaln”, x, x, x); – x=10 is 12 in octal and a in hexadecimal Note: %%: print a single ‘%’ – e. g. printf (“%f%% of the population: n”, 12. 4); Spring 2008 CSRU 3130 Unix Programming – 12. 4% of the population 17

Printf formatting tags (2) • printf : formatting tag – %[flags][width][. precision][length]specifier • Flags:

Printf formatting tags (2) • printf : formatting tag – %[flags][width][. precision][length]specifier • Flags: – 0: means left padding the output with 0 • Width: – A number specifying the width of the output • Precision: – How many number of digits after decimal points • Example: – – Spring 2008 printf (“square root of 10 is %20. 15 fn”, sqrt(10); square root of 10 is 3. 162277660168380 printf(“\%03 o”, ’>’); 076 CSRU 3130 Unix Programming 18

printf family • Write formatted output to standard output – int printf (const char

printf family • Write formatted output to standard output – int printf (const char *, … ); • Write formatted output to a file – int fprintf (FILE *, const char *, …); • Write formatted output to a string – int sprintf (char *, const char *, …); – char birth. Date[20]; – sprintf(birth. Date, ”%02 d/%04 d”, m, d, y); Spring 2008 CSRU 3130 Unix Programming 19

scanf (“scan formatted”) • int scanf (const char * format, … ); • Read

scanf (“scan formatted”) • int scanf (const char * format, … ); • Read from standard input, with format string specifying what kind of variable(s) to read, and how the variables are separated/delimited • “…” means “variable number of arguments”, each of them is address of variable that input should be stored • Return number of successfully matched and assigned input items (can be zero or smaller than provided) • Return EOF (-1) on error or end of file Spring 2008 CSRU 3130 Unix Programming 20

A simple example of scanf Conversion specifier int x, y; scanf (“%d %d”, &x,

A simple example of scanf Conversion specifier int x, y; scanf (“%d %d”, &x, &y); // a space match with // arbitrary number of blank, tab , etc. – Read from standard input two integers (separated by spaces, tab, newline), and save the first one to x; the second one to y – Abc de // return 0, as cannot convert “Abc” and “de” to integers – 1023 232 //return 2, x set to 1023, y set to 232 – 1023 232 // same as above – Note: always check the return value of scanf ! Spring 2008 CSRU 3130 Unix Programming 21

A simple example of scanf Conversion specifier scanf(“%d; %d”, &x, &y); “; ” can

A simple example of scanf Conversion specifier scanf(“%d; %d”, &x, &y); “; ” can only match with itself , i. e. , the two integers are separated by “; ” - 1023; 232 //x will be set to 1023, y set to 232 - 1023 234 //x will be set to 1023, y unset, return 1 Spring 2008 CSRU 3130 Unix Programming 22

Other scanf specifiers • Similar to the specifiers used in printf – u: unsigned

Other scanf specifiers • Similar to the specifiers used in printf – u: unsigned integer; – o: octal – x: hexadecimal – f: floating point – s: a string Spring 2008 CSRU 3130 Unix Programming 23

Problems with scanf • Using %s only read the next word from input –

Problems with scanf • Using %s only read the next word from input – not a line of input char s[100]; printf (“Type in your namen”); scanf (“%s”, &s); printf (“Your name is %sn”, s); • What if the user input a string longer than 100 characters long ? – segmentation fault Spring 2008 CSRU 3130 Unix Programming 24

Ways to work around char *string 1, *string 2; string 1 = (char *)

Ways to work around char *string 1, *string 2; string 1 = (char *) malloc (25); puts ("Please enter a string of 20 characters or fewer. "); scanf ("%20 s", string 1); printf ("n. You typed the following string: n%snn", string 1); With the field width (20 in the example), scanf will only read the first 20 characters from user input to save it to string 1. Still need to make sure the string 1 buffer is larger enough … Spring 2008 CSRU 3130 Unix Programming 25

Ways to work around (2) Char *string 2; puts ("Now enter a string of

Ways to work around (2) Char *string 2; puts ("Now enter a string of any length. "); scanf ("%as", &string 2); printf ("n. You typed the following string: n%sn", string 2); The “a” flag character tells scanf to allocate the buffer as large as needed (string 2 will be set to point to the buffer allocated) Spring 2008 CSRU 3130 Unix Programming 26

scanf is still not preferable • Avoid using scanf – can cause program to

scanf is still not preferable • Avoid using scanf – can cause program to crash due to buffer overflow – can hang if it encounters unexpected non-numeric input while reading a line from standard input (? ) – difficult to recover from errors when scanf template string does not match input exactly • Better practice to read input from keyboard: – read a line (i. e. , until a newline) with getline() or gets() – parse the resulting string with sscanf (“string scan formatted”, similar to scanf, but with input given by a string) Spring 2008 CSRU 3130 Unix Programming 27

Reading a line • fgets: read an entire line from a file • char

Reading a line • fgets: read an entire line from a file • char *fgets (char *s, int size, FILE * stream); – s: pointer pointing to a string, should be big enough – size: the maximum # of chars – FILE: a file handle, for now, remember • stdin (a constant): standard input Read a line (i. e. , read character until newline is met or until reach maximum #) from specified file, and save to the string pointed to by s Spring 2008 CSRU 3130 Unix Programming 28

Example of fgets/sscanf char s[101]; int month, day, year; printf (“Type in your namen”);

Example of fgets/sscanf char s[101]; int month, day, year; printf (“Type in your namen”); fgets (s, 100, stdin); printf (“Your name is %sn”, s); printf(“Type in your date of birth(MM/DD/YYYY)n”); fgets(s, 100, stdin); sscanf(s, ”%d/%d/%d”, &month, &day, &year); printf (“You were born at 02 d/%04 d”, month, day, year); Spring 2008 CSRU 3130 Unix Programming 29

getline: a GNU extension • ssize_t getline(char **lineptr, size_t *n, FILE *stream); FILE *

getline: a GNU extension • ssize_t getline(char **lineptr, size_t *n, FILE *stream); FILE * fp; char * line = NULL; size_t len = 0; ssize_t read; while ((read = getline(&line, &len, stdin)) != -1) { printf("Retrieved line of length %zu : n", read); printf("%s", line); Automatically malloc/relloc buffer … } if (line) free(line); Spring 2008 CSRU 3130 Unix Programming 30

scanf family • Read from standard input – int scanf (const char * format,

scanf family • Read from standard input – int scanf (const char * format, … ); • Read from a file – int fscanf (FILE * fp, const char * format, …); • Scan from a string – int sscanf (const char * s, const char * format, …); Spring 2008 CSRU 3130 Unix Programming 31

Standard I/O: File Access • So far, we learnt how to read from standard

Standard I/O: File Access • So far, we learnt how to read from standard input, write to standard output • Next: how to read from or write to a file … Spring 2008 CSRU 3130 Unix Programming 32

File Access: typical use • #include <stdio. h> • FILE *fp; • To open

File Access: typical use • #include <stdio. h> • FILE *fp; • To open a file – fp = fopen(“~/tmp. txt”, “r+”); • To read a character from a file: – int c; c = getc(fp); • To write a character to a file: – putc(c, fp); • Read/write using fscanf/fprintf, fgets/fputs, . . • Finally, fclose(fp) is used to close a file Spring 2008 CSRU 3130 Unix Programming 33

fopen() routine (1) #include <stdio. h> FILE *fopen(const char *path, const char *mode); •

fopen() routine (1) #include <stdio. h> FILE *fopen(const char *path, const char *mode); • path: a relative or full path name of the file • mode: access mode – “r”: open the file to read – “w”: open the file to write (existing content will be discarded) – “a”: open file to append (write starts at end of file) – “rw”: open file to read and write –… Spring 2008 CSRU 3130 Unix Programming 34

fopen library routine (2) FILE *fopen(const char *path, const char *mode); • FILE :

fopen library routine (2) FILE *fopen(const char *path, const char *mode); • FILE : a data structure containing info. needed to perform input or output operations on it, including: – – a file descriptor (will study in low-level file access) current stream position an end-of-file indicator , an error indicator a pointer to the stream's buffer, if applicable • Note: – read/write: at current stream position – Buffered I/O: not every write is applied to the disk immediately Spring 2008 CSRU 3130 Unix Programming 35

Example • FILE * fp = fopen(“~/tmp. txt”, “r+”); – takes a filename, does

Example • FILE * fp = fopen(“~/tmp. txt”, “r+”); – takes a filename, does some housekeeping and negotiation with the kernel – Returns pointer to the FILE data structure on success; return NULL on failure • Always check for error after the call if (fp==NULL){ printf (“failed to open file ~/tmp. txtn”); exit(1); } Spring 2008 CSRU 3130 Unix Programming 36

Meaningful error message • errno – integer variable, set by system calls and some

Meaningful error message • errno – integer variable, set by system calls and some library functions in event of an error to indicate what went wrong # include <errno. h> … if (fp==NULL){ switch (errno) { case EACCES: printf (“You don’t have permissionn”); break; case EINVAL: printf (“Invalid argument to fopenn”); break; default: Spring 2008 CSRU 3130 Unix Programming printf (“Something went wrong in fopenn”); 37

Using perror() #include <stdio. h> void perror(const char *s); • perror() produces a message

Using perror() #include <stdio. h> void perror(const char *s); • perror() produces a message on standard error output, describing the last error encountered, i. e. , errno. FILE * fp = fopen(“~/tmp. txt”, “r+”); if (fp==NULL){ perror(“open ~/tmp. txt”); exit(1); } Spring 2008 CSRU 3130 Unix Programming 38

Three special files • Whenever a program is started, three files are automatically opened,

Three special files • Whenever a program is started, three files are automatically opened, with file pointers stdin, stdout, stderr. • getchar() is same as getc(stdin) • putchar(c) is same as putc(stdout, c). • printf(s, …) is same as fprintf (stdout, s, …); • scanf(s, …) is same as fscanf (stdin, s, …); Spring 2008 CSRU 3130 Unix Programming 39

Other standard I/O Functions • feof(FILE *): return non-zero when end of file is

Other standard I/O Functions • feof(FILE *): return non-zero when end of file is reached • ferror(FILE *): return non-zero when any error • fflush (FILE *): flush any buffered output to the file Spring 2008 CSRU 3130 Unix Programming 40

New vis: handling files int main(int argc, char *argv[]) { int strip = 0;

New vis: handling files int main(int argc, char *argv[]) { int strip = 0; int i; FILE *fp; while (argc > 1 && argv[1][0] == '-'){ switch (argv[1][1]) { case 's': /* -s: strip funny characters */ strip = 1; break; default: fprintf(stderr, "%s: unknown arg %sn", argv[0], argv[1]); return 1; } argc--; argv++; Spring CSRU 3130 Unix Programming } 2008 41

main(): cont’d if (argc == 1) vis(stdin, strip); else for (i = 1; i

main(): cont’d if (argc == 1) vis(stdin, strip); else for (i = 1; i < argc; i++) if ((fp = fopen(argv[i], "r")) == NULL) { fprintf(stderr, "%s: can't open %sn", argv[0], argv[1]); return 1; } else { vis(fp, strip); fclose(fp); } return 0; Spring 2008 CSRU 3130 Unix Programming 42

Now vis void vis(FILE *fp, int strip) { int c; while ((c = getc(fp))

Now vis void vis(FILE *fp, int strip) { int c; while ((c = getc(fp)) != EOF) if (isprint(c) || isspace(c)) putchar(c); else if (!strip) printf("\%03 o", c); } Spring 2008 CSRU 3130 Unix Programming 43

A script a day: overwrite • To replace UNIX with UNIX(TM) in a file

A script a day: overwrite • To replace UNIX with UNIX(TM) in a file called ch 2: – How about: sed ‘s/UNIX(TM)/g’ ch 2 > ch 2 • A general solution ? – A script “overwrite” that save standard input to a file – sed ‘s/UNIX(TM)/g’ ch 2 | overwrite ch 2 – sort –k 3 –n data. txt | overwrite data. txt Spring 2008 CSRU 3130 Unix Programming 44

A script a day: overwrite #!/bin/bash PATH=/bin: /usr/bin case $# in 1) ; ;

A script a day: overwrite #!/bin/bash PATH=/bin: /usr/bin case $# in 1) ; ; *) echo “Usage: overwrite file’ 1>&2; exit 2; ; esac new=/tmp/overwr. $$ cat >$new cp $new $1 rm –f $new Spring 2008 CSRU 3130 Unix Programming 45

A command a day: find • find command: process a set of files and/or

A command a day: find • find command: process a set of files and/or directories in a file subtree; you can specify – where to search (pathname) – what type of file to search for (-type: directories, data files, links) – how to process the files (-exec: run a process against a selected file) – the name of the file(s) (-name) – perform logical operations on selections (-o and a) Spring 2008 CSRU 3130 Unix Programming 46

Examples • Search for file with a specific name in a set of files

Examples • Search for file with a specific name in a set of files (-name) – find. -name "rc. conf" -print • Apply a unix command to all files found: – find. -name "rc. conf" -exec chmod o+r '{}' ; • Search for a string in a selection of files – find. -exec grep "www. athabasca" '{}' ; -print – search in current directory and all sub directories. All files containing string will have their path Spring 2008 printed to standard CSRU 3130 Unix Programming 47 output.

More ways to find files • Find all files under the root, that is

More ways to find files • Find all files under the root, that is regular file and modified seven or fewer days ago – find / -type f -mtime -7 -print Spring 2008 CSRU 3130 Unix Programming 48

Homework 6: due April 8 (2 wks) • Implement your own “wc” command wc

Homework 6: due April 8 (2 wks) • Implement your own “wc” command wc [ -l ] [ -w ] [ -c ] [ filename. . . ] – Options: • -l: show # of lines • -w: report # of words • -c: report # of characters – List of filenames • Count for each of the file, and then report the total line/word/character numbers for all files – If no filename is given, count standard input Spring 2008 CSRU 3130 Unix Programming 49

What’s next ? • Development tools: (1 -1. 5 week) – gcc, make, gdb

What’s next ? • Development tools: (1 -1. 5 week) – gcc, make, gdb • Memory related topics (1 weeks) – Dynamic memory allocation • GNU C Library: malloc, free, etc. – Debugging memory problems • System calls – Low-level File access – Processes, pipes, signals – Threads Spring 2008 CSRU 3130 Unix Programming 50

What’s next ? • A final project, can be any of the followings –

What’s next ? • A final project, can be any of the followings – You can define your own project but instructor’s approval is needed • Some interesting tasks that require – Use multiple unix commands, shell programming – Some C programming to analyze data • Timeline for the project – Proposal presentation: – Prototype due: – Final project/presentation due: Spring 2008 CSRU 3130 Unix Programming 51

Project ideas • Example: – Use command wget to retrieve a web page, parse

Project ideas • Example: – Use command wget to retrieve a web page, parse the file and retrieve the web pages by following the hyperlinks within – Analyze: • web page size statistics, or web contents statistics (word frequency) Spring 2008 CSRU 3130 Unix Programming 52

Project ideas • C Program analyzing program – Beautifying program: automatically indent the program

Project ideas • C Program analyzing program – Beautifying program: automatically indent the program – Statistics reporting: lines of code, number of functions defined, number of loops, number of static variables, number of memory allocation/deallocation Spring 2008 CSRU 3130 Unix Programming 53