| |
Main Menu | Next Chapter |
| Section IV: Strings | Section I: Memory Allocation and the Pointer | Section II: Using Pointers | Section III: Arrays and Pointers |
A. Declaring String Variables Using Arrays
As you may recall, a string is a collection of characters. Up to now, we have only worked with string constants such as "Please enter the inventory code". We have not worked with strings as variables; nor have we input, output, or manipulated strings such as names, addresses, etc. However, our contract program would clearly benefit from being able to handle customer's names etc.
We did note back in chapter 2 that the most recent C++ standard talks about a built-in string classs. Information on this is also provided in Section XIII of the "Essentials of C++". Here we focus on the original 'C' and C++ way of handling strings.
In C++ strings have traditionally been treated as arrays with
one special feature - all string arrays must end with a special
symbol, which we write as '\0'. This is called the null character
and C++ strings are said to be null terminated . For
example, the string, "HELLO"
is stored as:

Note how the symbol '\0' takes up just one of the character slots of the array. While we write it as two symbols, internally it is represented with one value. Its purpose is to allow any built-in or user designed function to know the end of the string. Other languages use one of the elements of the string array itself to hold an integer representing the size of the array. Such a representation or data structure works well but usually means that strings in such a language have a maximum size. In C++ there is no limit to the size of a string, but we must make sure when we perform operations on strings to leave the '\0' symbol at the end.
A string prompt such as "Please enter the inventory code" is stored in an array but that array does not have a name. And, without a name, the memory for the string array is unaccessable. Therefore, it cannot be manipulated.
We have already learned how
to declare arrays with names and our focus here is on named arrays
treated as strings. Here's a start:
The variable 'helloString' represents an array of 14 characters. Note that at
least 14 characters are needed - 13 for the individual characters
(including the space) in the string constant and one for the '\0'.
While, it would not work to declare a smaller array, we could
have declared a larger array as in:
Some space is wasted here, but any properly written string processing
function will ignore the extra space because it will discover
the null character at the end of the string and stop.
It is also legal to declare a string array without stating explicitly the size of the array, if you initialize it right away as in:
C++ actually lets you declare any array without providing its
size, if you immediately include the array's initial values. -The
system determines the array's size by counting the number of data items in the
initialization list. (Since unnamed strings such as "Howdy
Pardner" already have a null character in their internal
representation, there is no danger of the system providing too
small an array.)
To declare and initialize a non-char array, one uses brackets to hold the initializing values as in:
This creates and initializes an array of 4 integers.
B. Declaring String Types
Since traditionally C and C++ did not include a built-in string
type, users would declare their own string types using the typedef
statement. Since strings are really character arrays, we can declare
a string type by declaring an array type as we did in chapters
7 and 8. For example, to declare a string type to represent strings
of 13 characters plus a null character, we could write:
The name used here for the string type is meant to convey the
fact that this type can be used for any string of 13 or less
characters - with space provided for the null character. Of course,
you can use any name for the type that you want but do try to
make it meaningful.
Whatever name you use, be sure to make the array big enough to
hold the largest string you expect to handle. For example,
if you write code that allows customer names of up to 50 characters,
you want a type defined as:
Also, don't forget that a type name does not set aside any memory.
To create space for customer names, you still need to declare
a variable of your new type as in:
C. The String Function Library
While traditionally C++ did not have a built-in string type, it has long been
standard to include a library of string functions to manipulate null-terminated character arrays. Most of these functions are found in the file, string.h, which is included with
the <...> symbols in a program since it is a standard library
file like 'iostream.h'. We will look at three of these functions
here. If you want others, you should consult the manual for the
C++ compiler you are using.
First is the function strlen. It receives a string and returns the length of the string NOT including the null character. Thus, given the code:
the variable 'len' will have the value 13 after the function call
- assuming any of the above initializations of 'helloString'. Likewise, 'len' would have the value 6 after the following code was executed:
The second function is strcpy. This receives two strings and copies the contents of the second string into the first. One must be careful that the first string parameter represents enough memory to hold the second string including the null character. As an example of this function, consider the following code:
The third function is strcmp. It receives two strings and
compares them, using whatever character code system the computer
uses. If the two strings
are the same, the function returns 0. If the first string parameter
is 'less than' the second, the function returns an integer less
than 0. And, if the first string parameter is 'greater than' the
second, the function returns an integer greater than 0.
You might wonder how such string comparisons are made. There are two common character representation schemes used in computers - ASCII and EBCDIC. Both of these represent characters (alphabetic characters, numeric characters, and symbols such as ',' and '$') with numeric codes formed out of eight bits (or one byte). Thus a string such as "Howdy Pardner" is a set of such numeric codes. These codes are carefully set up so that the code for 'A' is less than the code for 'B' which is less than the code for 'C' etc. (Click here to view the ASCII code.)
The algorithm
for comparing two strings (call them string1 and string2) starts
by looking at the numeric code for the first character in each
string. If the code for the first character in string1 is less
than the first character in string2, this implies that string1
comes before string2 alphabetically. (This is what is meant by
'less than' in the paragraph above.) Likewise, if the code for
the first character of string1 is greater than the code for the
first character of string2, this means string1 comes after string2
alphabetically.
Just as when humans are comparing two strings, if the codes for
the two first characters are not the same, the algorithm is finished.
But, if the two strings have the same first character and thus
have the same numeric codes for the first character, the algorithm
must proceed to compare the second characters etc. If the strings both have the same characters, they will also both have the same set of codes and, thus, will be alphabetically equal.
There are some issues with this. First, ASCII and EBCDIC arrange
special characters and digits in different ways, so the two systems will order strings that
have non-alphabetic characters in different ways. Second, we noted
earlier that the code for 'A' is different from the code for 'a'.
This is true for all the alphabetic characters. Thus, this algorithm
will conclude that the strings "PARDNER" and "pardner"
are different. Often programmers convert all string characters
to either upper or lower case before making comparisons if the
case of the characters is not significant.
To assist in this conversion process, the standard C++ library includes the functions
tolower and toupper. Both of these receive a character
and return a character. The first one returns the lower case character
corresponding to the character received. In other words, if it
receives an 'A' or an 'a', it returns 'a'. The second function
does just the opposite. Both these functions do nothing if the
received character is a digit or a symbol. (To use them, include the header file 'ctype.h'.)
D. Using the Null Character
Below is the code for a test program that calls a function "ConvertToUpper". This function receives a string and returns the same string with all its
characters in upper case. Notice how the test string consists
of uppercase characters, lower case characters, a digit, and a
representative set of symbols. A good test always tries to include
a set of cases that represents all possibilities.
// Test program to convert a whole string to upper case
// File: Strcnvrt.cpp
#include <ostream.h>
#include <ctype.h> // the file that contains the declaration for 'tolower'
// and 'toupper'
typedef char string10[11];
void ConvertToUpper(string10 s1, string10 s2);
void main()
{ string10 s1 = "hELlo 1,$";
string10 s2;
ConvertToUpper(s1, s2);
cout << "s1 = " << s1 << endl;
cout << "s2 = " << s2;
}
void ConvertToUpper(string10 s1, string10 s2)
{ char ch;
for (int index = 0; s1[index]; index++)
{ ch = toupper(s1[index]);
s2[index] = ch;
}
s2[index] = '\0';
}
The 'for' statement in "ConvertToUpper" walks through
the string "s1" (really an array), converting each character
and then moving the converted character to the second string,
"s2". Notice the test condition in the 'for' loop. "s1[index]"
will have a value other than 0 until the end of the
string. Since '\0' is the character with the numeric code of 0,
when the end of the string is encountered, the test goes false
and the loop stops. To emphasize the point: the test condition
could also have been written as:
The last line of this function is very important. Since nothing
is done with the '\0' in "s1", its value is moved into
"s2" after the loop. Notice how the '\0' in "s1"
is used. Without it, this 'for' statement might never end! Many
of the built-in string functions use a 'for' or 'while' statement
with the same test to determine when to finish. For that reason,
you need to always make sure you leave room for '\0' in your strings
and add it when necessary. To demonstrate the need for this, comment
out the last line of "ConvertToUpper" and run the program.
If you are wondering about the use of string variables in the
output lines here, check the section of this chapter entitled
"String Input and Output" (section F below).
E. Strings Declared as Pointers
There is another way to declare string variables, given that arrays
are so closely related to pointers. One can write:
You need to remember, however, that the code "char* myString" does not set aside memory for the string. It simply declares a pointer to the string. If you want a string of say size 50, you would write the code:
Only after you did this could you use myString, for example,
in a call to strcpy:
Note that when a string is passed to a function, the calling side should already have set the memory aside. Therefore, functions such as strcpy or ConvertToUpper are declared using character pointers as parameters and the functions themselves do not set aside memory for the strings. It is the responsibility of the code in the calling function to make sure enough memory has been set aside for both parameters.
For example, one could rewrite the declaration for "ConvertToUpper"
as:
In this second example, one assumes any actual parameters passed
to this function will represent enough memory for whatever string
is to be converted.
F. String Input and Output
String output is quite simple and you have already been doing
it with code such as:
A string variable can be output in the same way. For example, to output "s1" and "s2" in the "Strcnvrt.cpp" program above, we simply wrote:
This would have worked just as well if the string were declared as a pointer:
Both these forms work because the << operator knows to keep outputting characters until the \0 symbol is encountered. Another example of the importance of this symbol!
Input is a bit more complex. We have not said much about 'cin' yet and we won't say much until the next chapter. What you do need to know for the moment is that C++ programmers like to consider 'cin' as a stream of data connecting a data source such as keyboard or file with a program. For us, then, 'cin' is the connection, the stream, connecting the keyboard with the program.
When you use ">>" operator to extract data from the 'cin' stream, the system starts by skipping any leading blanks in the data coming down the stream. Once a non-blank symbol is encountered, the system extracts (reads) symbols until a blank is encountered. In entering data in earlier programs, you usually hit the 'Enter' key after each data item. This works because the system registers the 'Enter' key as if it were a blank.
Thus, if you have the code:
and you type the symbol '3', followed by the 'Enter' key, that '3' enters the 'cin' stream' at the keyboard end and flows out at the program end to be processed by the ">>" operator. The processing of the symbol includes being translated into the integer value 3 because the system sees that the destination is an integer variable. After the translation, the result is stored in the variable "num1".
If you type a few blanks before typing the '3', those blanks are skipped. If you type '312' followed by 'Enter', the system will read in the symbols '3', '1', and '2' and translate them into the integer '312'. If a user accidentally types some non-numeric characters, the extraction operation fails because the system cannot translated the non-numeric characters into integer values. (Chapter 11, Section IV discusses how to prevent infinite loops caused by entering invalid data in code involving a 'while' loop)
With the code:
the process is the same but this time the symbols are treated as characters. The ">>" operator starts, as before, by skipping any leading blanks in the data coming down the stream. Once it sees the first non-blank character it starts passing characters into the string variable until another blank is encountered. Then it adds the null character to 'myString' and stops. In other words, if the code is:
and you type:
the characters 'H' 'e' 'l' 'l' 'o' will be stored in "myString"
along with a '\0'. See the picture below:

Notice that:
These characters after "Hello" (including the blank)
are actually still in the stream waiting to be processed by the
program's next request for input.
While this works, it does have two problems. First, if there are
more characters uninterrupted by a blank in the input stream than
there are elements in the string array, the ">>"
operator will continue placing characters passed the end of the
array. (We have already talked about the danger of overflowing
an array.)
Second, if we want a string variable to hold blanks, this method
won't work. For example, suppose we want one string
variable to hold a first and last name such as "Curtis Sollohub".
If the user enters this name with a blank between the first and
last name, only the first name will be saved in the string using
the ">>" operator.
So far all our input has come through the ">>" operator, but there are a number of functions we can use to input data. The stream "cin" is actually an instance of a standard C++ class called istream, and in the <iostream.h> header file there is a declaration:
as there is a declaration:
One of the member functions associated with the class "istream" is named get. This function has many forms (like the constructors we have seen) and one of those forms can be used for string input. There are two required and one optional parameter with this member function. (Optional parameters use the default value mechanism introduced in Chapter 10.)
The first required
parameter is the string into which the data coming off the stream
should be read. The second is the maximum number of characters
to be read plus 1. (The "plus 1" is C++'s way
of reminding us that we need enough memory for the '\0'.)
The optional, third parameter has the default value '\n', the newline character. The purpose of the third parameter is to indicate under what conditions the 'get' member function should stop extracting characters from the stream. The 'get' function will continue extracting characters from the stream until a character matching that of the third parameter is encountered or until the maximum number of characters has been extracted. If the default value is used, 'get' will extract characters until the newline character (caused by the 'Enter' key being typed) is encountered or until the maximum number of character is extracted.
Remember that a member function is called by writing a class instance, followed by a period, followed by the function name. In our case, the class instance is 'cin' and the member function is 'get'. Thus, the code below:
will read up to 50 characters from the 'cin' stream and place them, along with a '\0', in "myString". (Since there is no third parameter, the default value of '\n' is used. What the 'get' function does is start reading immediately from the stream ('cin' in this case) without skipping leading blanks. As called, this function continues to read until it has read in 50 characters or until it sees the default newline character ('\n'). Thus, if you type "Curtis Sollohub" and then hit the "Enter" key, the computer will store "Curtis Sollohub" in "myString".

Note that if the user types some blanks before the word "Curtis",
they will be included in the string. Notice also that if the user
types "Curtis" and then hits the "Enter" key
before typing "Sollohub, the string will only hold the letters,
'C' 'u' 'r' 't' 'i' 's' - along with the null character.
It is also important to understand that the 'get' function
does not consume the newline character. It is still sitting on
the stream. In other words, if we again use the "get"
function, it will read in nothing because the first thing it will
encounter is the newline character left over from the last input
- even if we have typed more characters at the keyboard. To get
past this, we can use a character-based version of the 'get' function
as in the following example:
name = new char[81]; // left over from punch card days
// when lines were often 81 characters long
char* address;
address = new char[81];
char dummy;
cout << "Please enter your name and hit the Enter
key.\n"
cin.get(name, 81);
cin.get(dummy);
cout << "Please enter your street address and hit
the Enter key.\n";
cin.get(address, 81);
The first call to 'get' will read in a name the user types. The
line "cin.get(dummy)" will read in the newline character
and store it in the variable 'dummy'. (The function needs a place
to put the character read in but the program doesn't do anything
with it so we can call the variable 'dummy'.) The third call to
the function 'get' will read in an address.
This code does not have a second:
at the bottom. For this code fragment it is not necessary, but,
if the program were longer and included other input, it would be
safer to have gotten rid of the second newline right away.
As a review, remember that the compiler knows which version of
the overloaded function 'get' to use by looking at the number
and type of parameters involved in the call. In the file 'iostream.h',
'get' has been declared a number of times - once with one parameter,
a char type; and once with three parameters - a char* (string) type, an
int type, and a char type. The third parameter has a default value
of '\n' and therefore can be skipped in a call. It's purpose
is to indicate the character to look for to stop reading in characters.
See chapter 11 for more information.
Topics Covered in the "Essentials of C++"
| |
Main Menu | Next Chapter |