Previous Section Main Menu Next Chapter

CHAPTER 9

DYNAMIC MEMORY AND POINTERS


Section IV: Strings Section I: Memory Allocation and the Pointer Section II: Using Pointers Section III: Arrays and Pointers

A. Declaring String Variables Using Arrays

As you may recall, a string is a collection of characters. Up to now, we have only worked with string constants such as "Please enter the inventory code". We have not worked with strings as variables; nor have we input, output, or manipulated strings such as names, addresses, etc. However, our contract program would clearly benefit from being able to handle customer's names etc.

We did note back in chapter 2 that the most recent C++ standard talks about a built-in string classs. Information on this is also provided in Section XIII of the "Essentials of C++". Here we focus on the original 'C' and C++ way of handling strings.

In C++ strings have traditionally been treated as arrays with one special feature - all string arrays must end with a special symbol, which we write as '\0'. This is called the null character and C++ strings are said to be null terminated . For example, the string, "HELLO" is stored as:


Note how the symbol '\0' takes up just one of the character slots of the array. While we write it as two symbols, internally it is represented with one value. Its purpose is to allow any built-in or user designed function to know the end of the string. Other languages use one of the elements of the string array itself to hold an integer representing the size of the array. Such a representation or data structure works well but usually means that strings in such a language have a maximum size. In C++ there is no limit to the size of a string, but we must make sure when we perform operations on strings to leave the '\0' symbol at the end.

A string prompt such as "Please enter the inventory code" is stored in an array but that array does not have a name. And, without a name, the memory for the string array is unaccessable. Therefore, it cannot be manipulated.

We have already learned how to declare arrays with names and our focus here is on named arrays treated as strings. Here's a start:

char helloString[14] = "Howdy Pardner";

The variable 'helloString' represents an array of 14 characters. Note that at least 14 characters are needed - 13 for the individual characters (including the space) in the string constant and one for the '\0'. While, it would not work to declare a smaller array, we could have declared a larger array as in:

char helloString[20] = "Howdy Pardner";

Some space is wasted here, but any properly written string processing function will ignore the extra space because it will discover the null character at the end of the string and stop.

It is also legal to declare a string array without stating explicitly the size of the array, if you initialize it right away as in:

char helloString[] = "Howdy Pardner";

C++ actually lets you declare any array without providing its size, if you immediately include the array's initial values. -The system determines the array's size by counting the number of data items in the initialization list. (Since unnamed strings such as "Howdy Pardner" already have a null character in their internal representation, there is no danger of the system providing too small an array.)

To declare and initialize a non-char array, one uses brackets to hold the initializing values as in:

int evenNums[] = {2, 4, 6, 8};

This creates and initializes an array of 4 integers.

B. Declaring String Types
Since traditionally C and C++ did not include a built-in string type, users would declare their own string types using the typedef statement. Since strings are really character arrays, we can declare a string type by declaring an array type as we did in chapters 7 and 8. For example, to declare a string type to represent strings of 13 characters plus a null character, we could write:

typedef char String13[14];

The name used here for the string type is meant to convey the fact that this type can be used for any string of 13 or less characters - with space provided for the null character. Of course, you can use any name for the type that you want but do try to make it meaningful.

Whatever name you use, be sure to make the array big enough to hold the largest string you expect to handle. For example, if you write code that allows customer names of up to 50 characters, you want a type defined as:

typedef char String50[51];

Also, don't forget that a type name does not set aside any memory. To create space for customer names, you still need to declare a variable of your new type as in:

String50 customerName;

C. The String Function Library
While traditionally C++ did not have a built-in string type, it has long been standard to include a library of string functions to manipulate null-terminated character arrays. Most of these functions are found in the file, string.h, which is included with the <...> symbols in a program since it is a standard library file like 'iostream.h'. We will look at three of these functions here. If you want others, you should consult the manual for the C++ compiler you are using.

First is the function strlen. It receives a string and returns the length of the string NOT including the null character. Thus, given the code:

int len;
len = strlen(helloString);

the variable 'len' will have the value 13 after the function call - assuming any of the above initializations of 'helloString'. Likewise, 'len' would have the value 6 after the following code was executed:

len = strlen("Curtis");

The second function is strcpy. This receives two strings and copies the contents of the second string into the first. One must be careful that the first string parameter represents enough memory to hold the second string including the null character. As an example of this function, consider the following code:

This will copy the string, "Howdy Pardner", stored in 'helloString', into the string variable 'userName'. (So what if "Howdy Pardner" is a very strange user name!). Note that the code makes sure that 'userName has enough space in it for the string held in 'helloString'.

The third function is strcmp. It receives two strings and compares them, using whatever character code system the computer uses. If the two strings are the same, the function returns 0. If the first string parameter is 'less than' the second, the function returns an integer less than 0. And, if the first string parameter is 'greater than' the second, the function returns an integer greater than 0.

You might wonder how such string comparisons are made. There are two common character representation schemes used in computers - ASCII and EBCDIC. Both of these represent characters (alphabetic characters, numeric characters, and symbols such as ',' and '$') with numeric codes formed out of eight bits (or one byte). Thus a string such as "Howdy Pardner" is a set of such numeric codes. These codes are carefully set up so that the code for 'A' is less than the code for 'B' which is less than the code for 'C' etc. (Click here to view the ASCII code.)

The algorithm for comparing two strings (call them string1 and string2) starts by looking at the numeric code for the first character in each string. If the code for the first character in string1 is less than the first character in string2, this implies that string1 comes before string2 alphabetically. (This is what is meant by 'less than' in the paragraph above.) Likewise, if the code for the first character of string1 is greater than the code for the first character of string2, this means string1 comes after string2 alphabetically.

Just as when humans are comparing two strings, if the codes for the two first characters are not the same, the algorithm is finished. But, if the two strings have the same first character and thus have the same numeric codes for the first character, the algorithm must proceed to compare the second characters etc. If the strings both have the same characters, they will also both have the same set of codes and, thus, will be alphabetically equal.

There are some issues with this. First, ASCII and EBCDIC arrange special characters and digits in different ways, so the two systems will order strings that have non-alphabetic characters in different ways. Second, we noted earlier that the code for 'A' is different from the code for 'a'. This is true for all the alphabetic characters. Thus, this algorithm will conclude that the strings "PARDNER" and "pardner" are different. Often programmers convert all string characters to either upper or lower case before making comparisons if the case of the characters is not significant.

To assist in this conversion process, the standard C++ library includes the functions tolower and toupper. Both of these receive a character and return a character. The first one returns the lower case character corresponding to the character received. In other words, if it receives an 'A' or an 'a', it returns 'a'. The second function does just the opposite. Both these functions do nothing if the received character is a digit or a symbol. (To use them, include the header file 'ctype.h'.)

D. Using the Null Character
Below is the code for a test program that calls a function "ConvertToUpper". This function receives a string and returns the same string with all its characters in upper case. Notice how the test string consists of uppercase characters, lower case characters, a digit, and a representative set of symbols. A good test always tries to include a set of cases that represents all possibilities.


// Test program to  convert a whole string to upper case
// File: Strcnvrt.cpp

#include <ostream.h>
#include <ctype.h>    // the file that contains the declaration for 'tolower' 
                      // and 'toupper'

typedef char string10[11];

void ConvertToUpper(string10 s1, string10 s2);
void main()
{	string10 s1 = "hELlo 1,$";
	string10 s2;

	ConvertToUpper(s1, s2);

	cout << "s1 = " << s1 << endl;
	cout << "s2 = " << s2;
}


void ConvertToUpper(string10 s1, string10 s2)
{  	char ch;
	for (int index = 0; s1[index]; index++)
	{  	ch = toupper(s1[index]);
		s2[index] = ch;
	}
	s2[index] = '\0';
} 
The 'for' statement in "ConvertToUpper" walks through the string "s1" (really an array), converting each character and then moving the converted character to the second string, "s2". Notice the test condition in the 'for' loop. "s1[index]" will have a value other than 0 until the end of the string. Since '\0' is the character with the numeric code of 0, when the end of the string is encountered, the test goes false and the loop stops. To emphasize the point: the test condition could also have been written as:

for (int index = 0; s1[index] != '\0'; index++)

The last line of this function is very important. Since nothing is done with the '\0' in "s1", its value is moved into "s2" after the loop. Notice how the '\0' in "s1" is used. Without it, this 'for' statement might never end! Many of the built-in string functions use a 'for' or 'while' statement with the same test to determine when to finish. For that reason, you need to always make sure you leave room for '\0' in your strings and add it when necessary. To demonstrate the need for this, comment out the last line of "ConvertToUpper" and run the program. If you are wondering about the use of string variables in the output lines here, check the section of this chapter entitled "String Input and Output" (section F below).

E. Strings Declared as Pointers
There is another way to declare string variables, given that arrays are so closely related to pointers. One can write:

char* myString;

You need to remember, however, that the code "char* myString" does not set aside memory for the string. It simply declares a pointer to the string. If you want a string of say size 50, you would write the code:

char* myString;
myString = new char[51];
    // use 51 to allow a string of 50 characters and room for '\0'.

Only after you did this could you use myString, for example, in a call to strcpy:

strcpy(myString, "hello");

Note that when a string is passed to a function, the calling side should already have set the memory aside. Therefore, functions such as strcpy or ConvertToUpper are declared using character pointers as parameters and the functions themselves do not set aside memory for the strings. It is the responsibility of the code in the calling function to make sure enough memory has been set aside for both parameters.

For example, one could rewrite the declaration for "ConvertToUpper" as:

void ConvertToUpper(char* s1, char* s2)

In this second example, one assumes any actual parameters passed to this function will represent enough memory for whatever string is to be converted.

F. String Input and Output
String output is quite simple and you have already been doing it with code such as:

cout << "Please enter a grade";

A string variable can be output in the same way. For example, to output "s1" and "s2" in the "Strcnvrt.cpp" program above, we simply wrote:

cout << "s1 = " << s1 << endl;
cout << "s2 = " << s2;

This would have worked just as well if the string were declared as a pointer:

cout << myString;

Both these forms work because the << operator knows to keep outputting characters until the \0 symbol is encountered. Another example of the importance of this symbol!

Input is a bit more complex. We have not said much about 'cin' yet and we won't say much until the next chapter. What you do need to know for the moment is that C++ programmers like to consider 'cin' as a stream of data connecting a data source such as keyboard or file with a program. For us, then, 'cin' is the connection, the stream, connecting the keyboard with the program.

When you use ">>" operator to extract data from the 'cin' stream, the system starts by skipping any leading blanks in the data coming down the stream. Once a non-blank symbol is encountered, the system extracts (reads) symbols until a blank is encountered. In entering data in earlier programs, you usually hit the 'Enter' key after each data item. This works because the system registers the 'Enter' key as if it were a blank.

Thus, if you have the code:

int num1;
cin >> num1;

and you type the symbol '3', followed by the 'Enter' key, that '3' enters the 'cin' stream' at the keyboard end and flows out at the program end to be processed by the ">>" operator. The processing of the symbol includes being translated into the integer value 3 because the system sees that the destination is an integer variable. After the translation, the result is stored in the variable "num1".

If you type a few blanks before typing the '3', those blanks are skipped. If you type '312' followed by 'Enter', the system will read in the symbols '3', '1', and '2' and translate them into the integer '312'. If a user accidentally types some non-numeric characters, the extraction operation fails because the system cannot translated the non-numeric characters into integer values. (Chapter 11, Section IV discusses how to prevent infinite loops caused by entering invalid data in code involving a 'while' loop)

With the code:

cin >> myString;

the process is the same but this time the symbols are treated as characters. The ">>" operator starts, as before, by skipping any leading blanks in the data coming down the stream. Once it sees the first non-blank character it starts passing characters into the string variable until another blank is encountered. Then it adds the null character to 'myString' and stops. In other words, if the code is:

cout << "Please enter a string";
cin >> myString;

and you type:

" Hello There"

the characters 'H' 'e' 'l' 'l' 'o' will be stored in "myString" along with a '\0'. See the picture below:


Notice that:

  1. the ">>" operator automatically puts a '\0' character in the string variable;
  2. the blanks in front of "Hello" are skipped and;
  3. the characters after the first blank after 'Hello' are not included.

These characters after "Hello" (including the blank) are actually still in the stream waiting to be processed by the program's next request for input.

While this works, it does have two problems. First, if there are more characters uninterrupted by a blank in the input stream than there are elements in the string array, the ">>" operator will continue placing characters passed the end of the array. (We have already talked about the danger of overflowing an array.)

Second, if we want a string variable to hold blanks, this method won't work. For example, suppose we want one string variable to hold a first and last name such as "Curtis Sollohub". If the user enters this name with a blank between the first and last name, only the first name will be saved in the string using the ">>" operator.

So far all our input has come through the ">>" operator, but there are a number of functions we can use to input data. The stream "cin" is actually an instance of a standard C++ class called istream, and in the <iostream.h> header file there is a declaration:

istream cin;

as there is a declaration:

ostream cout;

One of the member functions associated with the class "istream" is named get. This function has many forms (like the constructors we have seen) and one of those forms can be used for string input. There are two required and one optional parameter with this member function. (Optional parameters use the default value mechanism introduced in Chapter 10.)

The first required parameter is the string into which the data coming off the stream should be read. The second is the maximum number of characters to be read plus 1. (The "plus 1" is C++'s way of reminding us that we need enough memory for the '\0'.)

The optional, third parameter has the default value '\n', the newline character. The purpose of the third parameter is to indicate under what conditions the 'get' member function should stop extracting characters from the stream. The 'get' function will continue extracting characters from the stream until a character matching that of the third parameter is encountered or until the maximum number of characters has been extracted. If the default value is used, 'get' will extract characters until the newline character (caused by the 'Enter' key being typed) is encountered or until the maximum number of character is extracted.

Remember that a member function is called by writing a class instance, followed by a period, followed by the function name. In our case, the class instance is 'cin' and the member function is 'get'. Thus, the code below:

cin.get(myString, 51);

will read up to 50 characters from the 'cin' stream and place them, along with a '\0', in "myString". (Since there is no third parameter, the default value of '\n' is used. What the 'get' function does is start reading immediately from the stream ('cin' in this case) without skipping leading blanks. As called, this function continues to read until it has read in 50 characters or until it sees the default newline character ('\n'). Thus, if you type "Curtis Sollohub" and then hit the "Enter" key, the computer will store "Curtis Sollohub" in "myString".


Note that if the user types some blanks before the word "Curtis", they will be included in the string. Notice also that if the user types "Curtis" and then hits the "Enter" key before typing "Sollohub, the string will only hold the letters, 'C' 'u' 'r' 't' 'i' 's' - along with the null character.

It is also important to understand that the 'get' function does not consume the newline character. It is still sitting on the stream. In other words, if we again use the "get" function, it will read in nothing because the first thing it will encounter is the newline character left over from the last input - even if we have typed more characters at the keyboard. To get past this, we can use a character-based version of the 'get' function as in the following example:

The first call to 'get' will read in a name the user types. The line "cin.get(dummy)" will read in the newline character and store it in the variable 'dummy'. (The function needs a place to put the character read in but the program doesn't do anything with it so we can call the variable 'dummy'.) The third call to the function 'get' will read in an address.

This code does not have a second:

cin.get(dummy);

at the bottom. For this code fragment it is not necessary, but, if the program were longer and included other input, it would be safer to have gotten rid of the second newline right away.

As a review, remember that the compiler knows which version of the overloaded function 'get' to use by looking at the number and type of parameters involved in the call. In the file 'iostream.h', 'get' has been declared a number of times - once with one parameter, a char type; and once with three parameters - a char* (string) type, an int type, and a char type. The third parameter has a default value of '\n' and therefore can be skipped in a call. It's purpose is to indicate the character to look for to stop reading in characters. See chapter 11 for more information.

Topics Covered in the "Essentials of C++"

Strings
Array Initialization
Default Parameter Values
The 'get' Member Function

Top of Section Main Menu Next Chapter