Introduction Programs which manipulate character data dont usually

  • Slides: 13
Download presentation
Introduction Programs which manipulate character data don’t usually just deal with single characters, but

Introduction Programs which manipulate character data don’t usually just deal with single characters, but instead with collections of them (e. g. words, lines of text, C++ programs). Arrays provide a way of doing this, and we can arrays of characters just as we can have arrays of ints and so on. Fundamentally character arrays are no different from other types of arrays. When dealing with characters, however, we typically conform to the convention that stored data be followed by a special “terminator”. An array of char ‘J’ ‘. ’ ‘’ ‘B‘ ‘l’ ‘o’ ‘w’ ‘’ ? stored data = “J. Blow” ? terminator The char constant ‘’ has code 0, and it elements beyond and the int constant 0 are therefore the terminator are effectively equivalent. Programmers unused often write 0 instead of writing ‘’. 1 91. 166 Copyright © 2000, Department of Systems and Computer Engineering, Carleton University

Strings A character array used in this fashion is called a “string”. Note that

Strings A character array used in this fashion is called a “string”. Note that the programmer is quite free to use character arrays in other ways - it is quite permissable, and sometimes useful, to have character arrays that don’t contain a terminator (provided, of course, that they aren’t used in situations which require a string). We have in fact been using strings for some time without realizing it. A string of characters enclosed in double quotes is a string constant. A constant array of char “ABCD” gives ‘A’ ‘B’ ‘C’ ‘D’ ‘’ String constants may be used in initializing character arrays. char msg 1 [6] = { ‘h’, ‘e’, ‘l’, ‘o’. ‘’}; // Ow!! char msg 2 [6] = “hello”; // much better char msg 3[] = “hello”; // better yet - array length is 6 91. 166 Copyright © 2000, Department of Systems and Computer Engineering, Carleton University 2

String Output Char arrays are an exception to the rule that arrays may not

String Output Char arrays are an exception to the rule that arrays may not be output. Outputting a char array causes the characters stored in it to be output, one by one, until a terminator is encountered. char text[] = “hello there”; // character array output (it just happens that the array // is constant). “Greetings!” will appear on the screen cout << “Greetings! “; cout << text; // “hello there” will appear on the screen text[5] = ‘’; // replace the blank cout << text; // “hello” will appear If the character array being output does not contain a terminator (is not in fact a string), the output process will continue past the end of the array. This is a programming error. 3 91. 166 Copyright © 2000, Department of Systems and Computer Engineering, Carleton University

String Input Char arrays are also an exception to the rule that arrays may

String Input Char arrays are also an exception to the rule that arrays may not be input. Inputting a char array reads the next non-whitespace sequence (e. g. the next “word”) in the input data. The “word” and a terminator are placed in the array. char str 1[20]. str 2[20], str 3[20]; cout << “Please enter three words: “; cin >> str 1 >> str 2 >> str 3; // if the user enters “I hate 91166” … // str 1 will be left containing “I”, // str 2 will be left containing “hate”, and // str 3 will be left containing “ 91166” // DO NOT confuse the contents of str 3 with the int // value 91166. we are dealing with a collection of // characters that just happen to be numeric digits. If the word to be read (plus the terminator) is longer than the array we’re tring to read into, we have a serious problem (characters will get stored beyond the end of the array). 91. 166 Copyright © 2000, Department of Systems and Computer Engineering, Carleton University 4

Setw on Input The problem can be avoided by using the “setw” manipulator. This

Setw on Input The problem can be avoided by using the “setw” manipulator. This is the same manipulator we’ve used in output statements, but in input statements it has a different meaning. In an input statement, “setw” limits the number of characters to be read. char a [6]; // the “setw” limits the number of characters // (including the terminator) that can be placed // in “a” to 6 (the size of the array). cin >> setw(6) >> a; If the use of “setw” prevents an input “word” from being completely read, the remaining characters are simply left in the input buffer. 5 91. 166 Copyright © 2000, Department of Systems and Computer Engineering, Carleton University

String Operations Strings cannot be assigned using the ‘=‘ operators, or compared using “==“,

String Operations Strings cannot be assigned using the ‘=‘ operators, or compared using “==“, “!=“, and the other realational operators. There are, however, a number of library functions which allow such operations (and others) to be performed. These functions are made available by including “string. h”. The most useful are listed on p 595 of the text. To “assign” one string to another, use function “strcpy”. Its prototype (somewhat simplified for our purposes) is as follows: void strcpy (char dest[], const char src[]); The contents of the source string (up to and including the terminator) are copied intom the destination string. The “const” indicates that the function does not modify the source string, and so allows string constants to be used. 6 91. 166 Copyright © 2000, Department of Systems and Computer Engineering, Carleton University

A DIY Strcpy If “strcpy” didn’t exist in the standard libraries, we could easily

A DIY Strcpy If “strcpy” didn’t exist in the standard libraries, we could easily write it ourselves. void strcpy (char dest[], const char src[]) { int i = 0; // copy everthing up to the terminator while (src[i] != ‘’) { dest[i] = src[i]; i++; } dest[i] = ‘’; // copy the terminator } Note the total lack of protection against the destination string overflowing. If the source string contains more characters (including the terminator) than will fit in the destination, bad things will happen. 7 91. 166 Copyright © 2000, Department of Systems and Computer Engineering, Carleton University

A Better Strcpy A “safe” equivalent can be created: void safe_strcpy (char dest[], const

A Better Strcpy A “safe” equivalent can be created: void safe_strcpy (char dest[], const char src[], int dest_size) { int i = 0; // copy until we see the terminator or we have // only have one spot left in the destination. while ((src[i] != ‘’) && ( i < (dest_size - 1))) { dest[i] = src[i]; i++; } dest[i] = 0; // terminate the destination string } The standard library function “strncpy” is similar to the above, but if it runs out of destination space it leaves the destination string unterminated (which is likely to cause problems later on). 8 91. 166 Copyright © 2000, Department of Systems and Computer Engineering, Carleton University

Comparing Strings (1) Strings can be compared by using the standard library function “strcmp”.

Comparing Strings (1) Strings can be compared by using the standard library function “strcmp”. int strcmp (const char str 1[], const char str 2[]); The function compares the strings and returns -1 if str 1 is less than str 2 0 if the two strings are identical +1 if str 1 is greater than str 2 Comparison is performed character by character, starting with the first characters of each string and continuing until either a mismatch is found or the end of one (or both) of the strings is reached. If a mismatch is found, the string with the lower character (based on the usual rules for character comparisons) is less than the other. If the end of one string is reached, the strings are equal if they have the same length. Otherwise the shorter of the strings is less than the longer one. 9 91. 166 Copyright © 2000, Department of Systems and Computer Engineering, Carleton University

Comparing Strings (2) Provided that one is dealing only with upper (or lower) case

Comparing Strings (2) Provided that one is dealing only with upper (or lower) case characters, “strcpy” produces the results one might intuitively expect. Thus “CAT” is less than “DOG”, and “rat” is greater than “chipmunk” (just as in a telephone book). When dealing with upper and lower case characters, one must keep in mind that ‘a’ is not equal to ‘A’, and that all of the upper case characters are less than the lower case ones. Thus “ABC” and “abc” are not equal, and “Zoro” is less than “aardvark”. The standard library function “stricmp” works just like “strcmp” except in that it treats upper and lower case characters as being equivalent. Thus “ABC” is equal to “abc” (zero is returned). int strcmp (const char str 1[], const char str 2[]); “Stricmp” is typically used more than “strcmp”. 10 91. 166 Copyright © 2000, Department of Systems and Computer Engineering, Carleton University

Strlen The number of character in a string can be obtained by using the

Strlen The number of character in a string can be obtained by using the standard library function “strlen”. int strlen (const char str[]); The length DOES NOT include the terminator. strlen(“cat”) is 3 “Strlen” could also be easily written if it didn’t exist (as indeed could “strcmp” and “stricmp”). void strlen (const char str[]) { int i = 0; while (src[i] != ‘’) { i++; } return i; } 91. 166 Copyright © 2000, Department of Systems and Computer Engineering, Carleton University 11

More String Input (1) Using the >> operator to read a string (as in

More String Input (1) Using the >> operator to read a string (as in “cin >> str; ”) gets us the next non-whitespace sequence (e. g. the next “word”). If we want to read an entire line, blanks and all, we must use a different approach. char a[100]; cin. get (a, 100, ‘n’); array to read characters into character at whch the input operation is to end. this character is NOT read. size of array = max number of characters to be read + 1. Reading stops when either the specified character (usually ‘n’) is encountered or the capacity of the array is reached (only one spot left) In either case a terminator is placed after the characters read. The character at which input is to end is NOT read but is instead left in the input buffer. This makes it possible (next slide) to determine whether the whole line was in fact successfully read. 12 91. 166 Copyright © 2000, Department of Systems and Computer Engineering, Carleton University

More String Input (2) Another form of “get” can be used to read the

More String Input (2) Another form of “get” can be used to read the next input character (whatever it is, including a blank or ‘n’). After attempting to read a line (previous slide) the next input character should be ‘n’. If it isn’t, the input line was too long for our array. if (cin. get() == ‘n’) { // read next input char // the read ended because ‘n’ was seen // eveything is fine. . . } else { // the input line was too long. . . } There is also a “getline” function. It works like “get” (see previous slide) but the “end of read” character is always ‘n’, and the ‘n’ is discarded instead of being left in the input buffer. cin. getline (a, 100); // read until ‘n’ is seen // after the read, we’ve no way of telling // whether or not we got the whole line. 91. 166 Copyright © 2000, Department of Systems and Computer Engineering, Carleton University 13