Characters and Strings Character and String definitions algorithms































- Slides: 31
Characters and Strings Character and String definitions, algorithms, library functions
Character and String Processing A common programming issue involves manipulation of text, usually referred to as string, or text, processing To achieve solutions typically requires capabilities to: perform input and output of characters and strings query what a single character is, or is not determine if a character, a substring, or any of a set of characters is included, or not, in a string determine the attributes of a character (eg. upper versus lower case) or string (eg. length) convert between character string and machine representations of different data types break large strings into smaller substrings recognized by tokens join substrings into larger strings (catenation)
Characters and Strings in C The concept of a string refers to a sequence of items. The sequence, or string, may contain zero or more elements, and a delimiter that denotes the end (termination) of the string. A string of characters, in computer science terms, usually refers to a vector, or list, of char values ASCII is commonly used Uni. Code is another In the C language, the special delimiter character ‘ ’ (called character null) is recognized by the compiler and assigned a specific integer value Strings of bits (or other encoded symbols) provides abstraction possibilities for more general strings.
Fundamentals String length Defining a string container Example: #define STRLEN 256 char str. Name [ STRLEN ] ; Example: char str. Name []; Consider a variation of the second example, using pointers: char * str. Ptr ; Delimiter char str. Name [ 50 ] , * str. Ptr ; (character null, int k ; Initialization terminal) for( k=0, str. Ptr = str. Name ; k<49; k++, str. Ptr++ ) *str. Ptr = ‘#’ ; Example: *str. Ptr = ‘ ’ ; char str. Name 1 [ ] = “My name is Bob!” ; const char * str. Static = “String that cannot be changed!” ; char str. Name 2 [ ] = { ‘H’, ‘e’, ‘l’, ‘o’, ‘ ’ } ; Example: char str. Name [ 50 ] ; int k ; for( k=0; k<49; k++ ) str. Name[k] = ‘#’ ; // Fill with # symbols str. Name[49] = ‘ ’ ; H e l l o Sequence of characters (value of the string)
Character Handling Library The C language standard supports the notion of char data type, and the delimiter character code ‘ ’. We do not need to know the details of how character data is represented in bit form In programming and algorithm design it is useful to know and use a wide variety of functions that query or manipulate (transform) both individual character data as well as strings of characters We will discuss functions from four libraries #include <ctype. h> #include <stdlib. h> and #include <stdio. h> #include <string. h> We start with the character function library, <ctype. h>
Character Handling Library <ctype. h> Begin with character query functions General prototype form: int is. Value. Range ( int c ) ; // Returns 1 if a match, or 0 Value. Range refers to a single value or a range of values Function Prototype Function Description Note inputaargument c ifhas type 0 int positive value c is ‘the ‘ date ; otherwise int isblank( int c ); that the. Returns would suggest c should beif char int isdigit( int. Intuition c ); Returns a positive value c is atype base-10 digit in the Technical considerations (involving representation of nonrange ‘ 0‘ to ‘ 9’ ; otherwise 0 ASCII recommend int, recalling int isalpha( int c ); data encodings) Returns a positive value if for c isusing an alphabetic that char is acharacter compatible sub-type of int (and or short in the range ‘a‘ to ‘z’, ‘A‘int). to ‘Z’ ; otherwise 0 int isalnum( int c ); Returns a positive value if c is an alphabetic character, or a base-10 digit ; otherwise 0 int isxdigit( int c ); Returns a positive value if c is a base-16 (hexadecimal) digit in the range ‘ 0‘ to ‘ 9’, or ‘a’ to ‘f’, or ‘A’ to ‘F’ ; otherwise 0
Character Handling Library Additional query functions provide information about the nature of the character data Transformative functions modify the character data Function Prototype Function Description int islower( int c ); Returns a positive value if c is a lower case alphabetic character in the range ‘a‘ to ‘z’; otherwise 0 int isuppper( int c ); Returns a positive value if c is an upper case alphabetic character in the range ‘A‘ to ‘Z’ ; otherwise 0 int tolower( int c ); Returns the value c if c is a lower case alphabetic character, or the upper case variant of the same alphabetic character (Ex. tolower( ‘A’ ) returns ‘a’) int toupper( int c ); Returns the value c if c is an upper case alphabetic character, or the lower case variant of the same alphabetic character (Ex. toupper( ‘e’ ) returns ‘E’)
Character Handling Library And still more query functions for non-alphanumeric character data (eg. graphical, control signals, punctuation) Function Prototype Function Description int isspace( int c ); Returns >0 if c is any valid white space character data (including blank, newline, tab, etc); otherwise 0 int iscntrl( int c ); Returns >0 if c is any valid control character data (including ‘n’, ‘b’, ‘r’, ‘a’ etc); otherwise 0 int ispunct( int c ); Returns >0 if c is any valid, printable punctuation character data (including ‘, ’, ‘; ’, ‘: ’ etc. ); otherwise 0 int isprint( int c ); Returns >0 if c is any valid, printable character data; otherwise 0 int isgraph( int c ); Returns >0 if c is any valid character data representing a graphical symbol (such as ‘<’, ‘>’, ‘#’, ‘$’ etc, and including extensions to ASCII); otherwise 0
Example: Counting characters Problem: Determine the frequencies of occurrence for each alphabetic character (ignoring case) in a text file. Solution: #include <ctype. h> #include <stdio. h> int main ( ) { int N=0, K, C[26] ; double F[26] ; char Ch ; for( K=0; K<26; K++ ) { C[K]=0; F[K]=0. 0; } for( Ch=getchar(); Ch != EOF; N++, Ch=getchar() ) { if( isalpha( Ch ) ) { K = toupper( Ch ) – ‘A’ ; C[K]++ ; } for( K=0; K<26; K++) { F[K] = C[K] * 1. 0 / N ; printf( “Frequency of letter %c: %lfn”, (char) (K+’A’), F[K] ) ; } return 0 ; }
String Conversion Functions: <stdlib. h> Purpose of these functions is to convert a string (or portion) to (1) an integer or (2) a floating point type General prototype form: result. Type strto. Output. Type ( const char * n. Ptr, char **end. Ptr [, int base ] ) ; n. Ptr points at the input string (protected as constant) result. Type refers to one of double, double long int, int or unsigned long int Output. Type refers to one of d, l, or ul base refers to the base of the input string (0, or 2. . 36) end. Ptr points at the position within the input string where a valid numeric representation terminates end. Ptr n. Ptr - 1 2 3 . 8 9 5 $ b C
String Conversion Functions Function Prototype Function Description double strtod( const char * n. Ptr, char **end. Ptr ); If n. Ptr points at a valid string representation of a signed real number (possibly followed by additional character data), return a double value; Else return 0 if no part of the input string can be converted. Return a pointer (through *end. Ptr) to the character following the last convertible character – if no part of the input string is Note that one can also determine the size of the initial substring convertible then *end. Ptr is set to n. Ptr. used to determine the double value returned, namely: int Num. Chars ; end. Ptr n. Ptr Num. Chars = -( EP 1– S 2) / sizeof( 3. char 8 ) ; 9 // sizeof(char) 5 $ usually b C 1 Example usage: double D ; const char * S = “ -123. 895 $b. C” ; char * EP ; D = strtod( S, &EP ) ; if( EP != S ) printf( “Value converted is %lfn”, D ) ; else printf( “No value could be convertedn” ) ;
String Conversion Functions Function Prototype Function Description long strtol( const char * n. Ptr, char **end. Ptr, int base ); If n. Ptr points at a valid string representation of a signed integer number (possibly followed by additional character data), return a long int value; Else return 0 if no part of the input string can be converted. Return a pointer (through *end. Ptr) to the character following the last convertible character – if no part of the input string is convertible then *end. Ptr is set to n. Ptr. The input long int LI ; string may use any base digits in the range 0 to const char * S = “ (base-1). -1234. $b. C” ; char * EP ; unsigned long strtoul( to =strtol() LI = strtol( S, &EP, Performs 0 ) ; // 0 analogously base => base 8, 10, for 16 string to const char * n. Ptr, long int conversion. if( EP != S ) printf(unsigned “Value converted is %ldn”, LI ) ; char **end. Ptr, int base ); else printf( “No value could be convertedn” ) ; end. Ptr n. Ptr - 1 2 3 4 . $ b C
String Conversion Functions The base argument value (for integer conversions only!) long int LI ; defines thechar base of“ –Ab 2$” the input string. const *S= ; char * EP ; the input string digits may be in base 8, 10 or For base=0, 16. LI = strtol( S, &EP, 13 ) ; // base = 13 if( EP != S ) printf( “Value converted is %ldn”, LI ) ; The case is“No notvalue used. else base=1 printf( could be convertedn” ) ; For 2 <= base <= 36 the characters that are interpretable Value outputted is the negative of: as////base digits lie in the range from 0 to (base-1) A*13*13 + b*13 + 2 = 1690+143+2 = 1835 (base-10) Base digits (upper or lower case alpha chars) 0 0, 1, … , F 2 0, 1 10 0, 1, 2, … , 9 13 0, 1, … , 9, A, B, C 24 0, 1, … , 9, A, B, … , N 36 0, 1, … , 9, A, B, … , Z
String Conversion Functions The C standard utilities library <stdlib. h> also includes two additional conversion functions for long int, both signed and unsigned. Function Prototype Function Description long strtoll( const char * n. Ptr, char **end. Ptr, int base ); Performs analogously to strtol() for string to long int conversion, with identical treatment of non -convertible strings, treatment of *end. Ptr and base. unsigned long strtoull( const char * n. Ptr, char **end. Ptr, int base ); Performs analogously to strtoul() for string to unsigned long int conversion, with identical treatment of non-convertible strings, treatment of *end. Ptr and base.
Useful <stdio. h> Functions The C standard input/output library contains useful functions I/O of characters and strings Conversion to and from character and internal data representations
Useful <stdio. h> Functions Function Prototype and Description #include <stdio. h> int getchar( void ); int main and () { returns a single character from the input stream (stdin); if end of file Fetches int C ; //then can also use char is signalled the return value is EOF while( int putchar( int(C C =); getchar() ) != EOF && C != ‘n’ ) putchar( C character ); Outputs a single to the output stream (stdout). Returns the same return 0 ; #include <stdio. h> character if successful; otherwise returns EOF on failure } #define MAX 256 char CAUTION: * fgets( char. When * S, intstdin N, FILE * keyboard, stream); remember that int mainis()the { Fetches up to either (a) generates a new line a‘n’, or (b) EOF, or (c) N-1 pressingall thecharacters Enter key charto. Ssignal [ MAXinput ], * s. Ptr ; characters have been inputted, and then appends a delimiter ‘ ’ to make a character and thiswhile( must (s. Ptr be accounted = fgets( S, for. MAX, stdin )) != NULL ) string. The pointer S points to the inputted string. Input is from the input stream puts( S ) ; (typically stdin, but can be from a text file). Returns a pointer to the input string, return 0 ; or NULL if failure occurs (as with EOF). } int puts( const char * S ); Outputs the string of characters S, followed by a newline ‘n’. Returns a nonzero integer result (typically the number of characters outputted), or EOF on failure.
Useful <stdio. h> Functions #include The functions sprintf() and sscanf() are<stdio. h> used for processing of int mainrepresentations () { character (string) data and machine of data int A ; float X ; (according to different data types). char S[100], M[100] ; All data processing is done in RAM – no I/O is involved! char Format. Str[7] = “%d%f%s” ; scanf( Format. Str, &A, &X, S ); printf( Format. Str, A, X, S ) ; int sprintf( char * S, const char * format [, …] ); Used in the same way as printf(), exceptfgets( that the of characters M, string 100, stdin ); produced is directed to the string argument S, according to the&A, format sscanf( M, Format. Str, &X, string S ); (and referenced parameters). sprintf( M, Format. Str, A, X, S ); Function Prototype and Description int sscanf( char * S, const char * format [, …]puts( ); M ); Used in the same way as scanf(), except that the string S contains the “input” returnstring 0 ; (and referenced data to be processed according to the format } parameters).
String Manipulation Functions Two functions are provided to perform copying of one string into another string. Function Prototype and Description char * strcpy( char * Dest, const char * Src); Copies the source string Src to the destination Dest. If Src is shorter, or equal in length, to Dest, the entire string is copied. If Src is longer than Dest, only those characters that will fit are copied – note that this may leave Dest without a delimiter ‘ ’ (which fails to define a proper string). char * strncpy( char * Dest, const char * Src, size_t N); Copies the first N characters of the source string Src to the destination Dest. If N is less than the length of Dest, the entire Src string is copied – if the length of Src is less than N then the entire Src string is copied and as many ‘ ’ as needed are inserted to fill up to N characters is performed. . If N is greater than the length of Dest, only those characters that will fit are copied – note that this may leave Dest without a delimiter ‘ ’ (which fails to define a proper string). Remember that strncpy() does not append the delimiter automatically!
String Manipulation Functions Joining together of two strings is called string catenation (also called concatenation). For instance, one might combine various words and phrases to form sentences and paragraphs. Function Prototype and Description char * strcat( char * S 1, const char * S 2); Copy string S 2 to a position in S 1, following the string already in S 1. Note that the original ‘ ’ delimiter in S 1 is overwritten by the first character in the S 2 string, so that only one delimiter occurs at the end of the modified S 1 string. If the total number of characters is greater than the capacity of S 1 then a logical error will likely ensue. char * strncat( char * S 1, const char * S 2, size_t N); Copy the first N characters of the string S 2 to a position in S 1, following the string already in S 1. The original ‘ ’ delimiter in S 1 is overwritten by the first character in the S 2 string, and only one delimiter occurs at the end of the modified S 1 string inserted by strncat() automatically. If the total number of characters is greater than the capacity of S 1 then a logical error will likely ensue.
String Comparison Functions Comparison of two strings is based on the notion of lexical ordering. All characters encoded (eg. ASCII) and the numeric values Function Prototype and are Description of the characters defines the possible orderings. int strcmp( const char * S 1, const char * S 2); Strings comparisons are. Returns done based bothequivalent (a) character Compares S 1 and S 2. 0 if S 1 on is fully to S 2, by a character (b) use relative length ofnumber each is positive number comparison, if S 1 is lexicallyand greater thanof. S 2, and a negative S 1 is string. lexically less than S 2. int strncmp( const char * S 1, const char * S 2, size_t N); Compares up to the first N characters of the strings S 1 and S 2. Returns 0 if S 1 is fully equivalent to S 2, a positive number if S 1 is lexically greater than S 2, and a negative number is S 1 is lexically less than S 2. Note that if the length of either S 1 or S 2 is less than N, the comparison is done only for the characters present in each string.
Strings - Search Functions C provides functions for searching for various characters and substrings within a string This is a huge advantage in text processing Function Prototype and Description char * strchr( const char * S, int C); Locates the position in S of the first occurrence of C. Returns the pointer value to where C is first located; otherwise returns NULL. size_t strspn( const char * S 1, const char * S 2 ); String S 1 is searched, and returns the length of the initial substring segment in S 1 that contains characters only found in S 2. size_t strcspn( const char * S 1, const char * S 2 ); String S 1 is searched, and returns the length of the initial substring segment in S 1 that contains characters not found in S 2.
Strings - Search Functions Function Prototype and Description char * strpbrk( const char * S 1, const char * S 2 ); Locates the first occurrence in S 1 of any character found in S 2, and returns a pointer to that position in S 1. Otherwise a NULL value is returned. char * strrchr( const char * S 1, int C ); Locates the last occurrence in S 1 of any character found in S 2, and returns a pointer to that position in S 1. Otherwise a NULL value is returned. char * strstr( const char * S 1, const char * S 2 ); Locates the first occurrence in S 1 of the entire string S 2. Otherwise a NULL value is returned.
Strings - Search Functions Consider the problem of a string of text S 1 that contains various words (substrings) separated by specially designated characters used as delimiters (and contained in a string S 2). The objective is to extract the words from the text. This can be accomplished using the function strtok() repeatedly. Each identified substring in S 1, delimited by a character in S 2, is called a token. Thus, strtok() is called the string tokenizer function. Function Prototype and Description char * strtok( char * S 1, const char * S 2 ); The first call to strtok() states the argument S 1 and provides the string of delimiters S 2. Returns a pointer to the next token found in S 1. Each subsequent call to strtok() uses NULL as the first argument (instead of the string S 1), and the function remembers where it left off from the last time it was called. Each time strtok() is called, it points to the next token found also replaces the delimiter character by ‘ ’. Thus, S 1 is modified! Thus, a sequence of calls to strtok() breaks S 1 into token substrings.
Strings - Search Functions #include <stdio. h> #include <string. h> int main () { int N = 0 ; char S[] = “This is a sentence with tokens separated by blanks. ” ; char * token. Ptr ; printf( “The following tokens were found in S. n” ) ; token. Ptr = strtok( S, “ “ ) ; // First time use S; ‘ ‘ is the only delimiter while( token. Ptr != NULL ) { N++ ; printf( “%sn”, token. Ptr ) ; token. Ptr = strtok( NULL, “ “ ) ; // Use NULL in successive calls } printf( “Number of tokens found = %dn”, N ) ; return 0 ; }
Strings - Search Functions #include <stdio. h> #include <string. h> int main () { int N = 0 ; char S[] = “This is a sentence with tokens separated by various characters. ” ; char * token. Ptr, * Delim. List = “. , ; : $“ ; printf( “The following tokens were found in S. n” ) ; token. Ptr = strtok( S, Delim. List ) ; // First time use S; various delimiters while( token. Ptr != NULL ) { N++ ; printf( “%sn”, token. Ptr ) ; token. Ptr = strtok( NULL, Delim. List ) ; // Use NULL in successive calls } printf( “Number of tokens found = %dn”, N ) ; return 0 ; }
Memory Functions in <string. h> C also provides functions for dealing with blocks of data in RAM The blocks may be characters, or other data types, hence the functions typically return a void * pointer value. A void * pointer value can be assigned to any other pointer type, and vice versa. Function Prototype and Description However, void * pointers cannot be dereferenced, thus the size of the void * memcpy( void * S 1, voidas * S 2, size_t N ); block must be const specified an argument. Copies N characters (bytes) from the object S 2 into the object S 1. A pointer to None of the functions discussed perform checks for terminating null the resulting object (delimiters). (S 1) is returned, otherwise NULL is returned on failure. characters Note: The result of this function is undefined if S 1 and S 2 overlap! void * memmove( void * S 1, const void * S 2, size_t N ); Copies N characters (bytes) from the object S 2 into the object S 1. A pointer to the resulting object (S 1) is returned, otherwise NULL is returned on failure. Note: This function utilizes a temporary memory space to perform the copying, hence the operation is always defined.
Memory Functions in <string. h> Function Prototype and Description int memcmp( const void * S 1, const void * S 2, size_t N ); Compares the first N characters (bytes) of S 1 and S 2. Returns 0 if S 1==S 2, >0 if S 1>S 2, and <0 if S 1<S 2. void * memchr( const void * S 1, int C, size_t N ); Locates the first occurrence of the character C in the first N characters (bytes) of S 1. If C is found, a pointer to C in S 1 is returned. Otherwise, NULL is returned. void * memset( void * S 1, int C, size_t N ); Copies the character (byte) C to the first N positions of S 1. A pointer to S 1 is returned, or NULL on failure. Note: the type of C is modified to unsigned char to enable copying to blocks of arbitrary data type.
Other Functions in <string. h> Function Prototype and Description size_t strlen( const char * S ); Determines and returns the number of characters in S, not including the ‘ ’ delimiter. char * strerror( int errornum ); Outputs to stdout an error message (defined by others as standard messages) referenced by an error number code. For instance, the statement printf( “%sn”, strerror( 2 ) ) ; might generate the output string: No such file or directory
Secure C programming C 11 standard with Annex K Addresses issues related to robustness of array based manipulation of character data (and other data containers) Stack overflow detection Array overflow detection Read more: CERT guideline INT 05 -C www. securecoding. cert. org Additional online Appendices E-H for the textbook www. pearsonhighered. com/deitel/
Summary Concepts of character and strings, query functions, transformation functions, search functions, generalization to abstract strings (memory functions).
Topic Summary Characters and Strings in the C language Multiple library sources Query functions Transformative functions Conversion functions Practice, practice, Memory functions practice ! Reading – Chapter 8 Review Pointers as well, especially the const qualifier, and also the use of ** for modifying pointer values on return (through arguments) from functions. Reading – Chapter 9: Formatted Input and Output This chapter is straightforward and is assigned for self-directed independent study and learning – it will be tested!