| Section IV: Strings | Section I: Memory Allocation and the Pointer | Section II: Using Pointers | Section III: Arrays and Pointers |
|
Table of Contents
Learning C++:
An Index of Entry Points
2. The A reference document on the basic elements of C++.
3. The Patterns
|
A. Declaring String Variables Using Arrays As you may recall, a string is a collection of characters. Up to now,
we have only worked with string constants such as "Please enter the
inventory code". We have not worked with strings as variables; nor have we
input, output, or manipulated strings such as names, addresses, etc.
However, our contract program would clearly benefit from being able to
handle customer's names etc.
We did note back in chapter 2 that the most recent C++ standard talks
about a built-in string classs. Information
on this is also provided in Section XIII of the
"Essentials of C++". Here we focus on the original 'C' and C++ way of
handling strings. In C++ strings have traditionally been treated as arrays with one
special feature - all string arrays must end with a special symbol, which
we write as '\0'. This is called the null character and C++ strings
are said to be null terminated . For example, the string, "HELLO"
is stored as:
![]() Note how the symbol '\0' takes up just one of the
character slots of the array. While we write it as two symbols, internally
it is represented with one value. Its purpose is to allow any built-in or
user designed function to know the end of the string. Other languages use
one of the elements of the string array itself to hold an integer
representing the size of the array. Such a representation or data
structure works well but usually means that strings in such a language
have a maximum size. In C++ there is no limit to the size of a string, but
we must make sure when we perform operations on strings to leave the '\0'
symbol at the end. A string prompt such as "Please enter the inventory code" is stored in
an array but that array does not have a name. And, without a name, the
memory for the string array is unaccessable. Therefore, it cannot be
manipulated.
We have already learned how to declare arrays with names and our focus
here is on named arrays treated as strings. Here's a start:
The variable 'helloString' represents an array of 14 characters.
Note that at least 14 characters are needed - 13 for the individual
characters (including the space) in the string constant and one for the
'\0'. While, it would not work to declare a smaller array, we could have
declared a larger array as in:
Some space is wasted here, but any properly written string processing
function will ignore the extra space because it will discover the null
character at the end of the string and stop. It is also legal to declare a string array without stating explicitly
the size of the array, if you initialize it right away as in:
C++ actually lets you declare any array without providing its size, if
you immediately include the array's initial values. -The system determines
the array's size by counting the number of data items in the
initialization list. (Since unnamed strings such as "Howdy Pardner"
already have a null character in their internal representation, there is
no danger of the system providing too small an array.) To declare and initialize a non-char array, one uses brackets to hold
the initializing values as in:
This creates and initializes an array of 4 integers. B. Declaring String Types
The name used here for the string type is meant to convey the fact that
this type can be used for any string of 13 or less characters - with space
provided for the null character. Of course, you can use any name for the
type that you want but do try to make it meaningful. Whatever name you use, be sure to make the array big enough to hold the
largest string you expect to handle. For example, if you write code that
allows customer names of up to 50 characters, you want a type defined
as:
Also, don't forget that a type name does not set aside any memory. To
create space for customer names, you still need to declare a variable of
your new type as in:
C. The String Function Library First is the function strlen. It receives a string and returns
the length of the string NOT including the null character.
Thus, given the code:
the variable 'len' will have the value 13 after the function call -
assuming any of the above initializations of 'helloString'. Likewise,
'len' would have the value 6 after the following code was
executed: The second function is strcpy. This receives two strings and
copies the contents of the second string into the first. One must be
careful that the first string parameter represents enough memory to hold
the second string including the null character. As an example of this
function, consider the following code:
The third function is strcmp. It receives two strings and
compares them, using whatever character code system the computer uses. If
the two strings are the same, the function returns 0. If the first string
parameter is 'less than' the second, the function returns an integer less
than 0. And, if the first string parameter is 'greater than' the second,
the function returns an integer greater than 0. You might wonder how such string
comparisons are made. There are two common character representation
schemes used in computers - ASCII and EBCDIC. Both of these represent
characters (alphabetic characters, numeric characters, and symbols such as
',' and '$') with numeric codes formed out of eight bits (or one byte).
Thus a string such as "Howdy Pardner" is a set of such numeric codes.
These codes are carefully set up so that the code for 'A' is less than the
code for 'B' which is less than the code for 'C' etc. (Click here to view the ASCII code.)
The algorithm for comparing two strings (call them string1 and string2)
starts by looking at the numeric code for the first character in each
string. If the code for the first character in string1 is less than the
first character in string2, this implies that string1 comes before string2
alphabetically. (This is what is meant by 'less than' in the paragraph
above.) Likewise, if the code for the first character of string1 is
greater than the code for the first character of string2, this means
string1 comes after string2 alphabetically. Just as when humans are comparing two strings, if the codes for the two
first characters are not the same, the algorithm is finished. But, if the
two strings have the same first character and thus have the same numeric
codes for the first character, the algorithm must proceed to compare the
second characters etc. If the strings both have the same characters, they
will also both have the same set of codes and, thus, will be
alphabetically equal. There are some issues with this. First, ASCII and EBCDIC arrange
special characters and digits in different ways, so the two systems will
order strings that have non-alphabetic characters in different ways.
Second, we noted earlier that the code for 'A' is different from the code
for 'a'. This is true for all the alphabetic characters. Thus, this
algorithm will conclude that the strings "PARDNER" and "pardner" are
different. Often programmers convert all string characters to either upper
or lower case before making comparisons if the case of the characters is
not significant. To assist in this conversion process, the standard C++ library includes
the functions tolower and toupper. Both of these receive a
character and return a character. The first one returns the lower case
character corresponding to the character received. In other words, if it
receives an 'A' or an 'a', it returns 'a'. The second function does just
the opposite. Both these functions do nothing if the received character is
a digit or a symbol. (To use them, include the header file 'ctype.h'.) D. Using the Null Character
The last line of this function is very important. Since nothing is done
with the '\0' in "s1", its value is moved into "s2" after the loop. Notice
how the '\0' in "s1" is used. Without it, this 'for' statement might never
end! Many of the built-in string functions use a 'for' or 'while'
statement with the same test to determine when to finish. For that reason,
you need to always make sure you leave room for '\0' in your strings and
add it when necessary. To demonstrate the need for this, comment out the
last line of "ConvertToUpper" and run the program. If you are wondering
about the use of string variables in the output lines here, check the
section of this chapter entitled "String Input and Output" (section F
below). E. Strings Declared as Pointers
You need to remember, however, that the code "char* myString" does not
set aside memory for the string. It simply declares a pointer to the
string. If you want a string of say size 50, you would write the code:
Only after you did this could you use myString, for example, in a call
to strcpy:
Note that when a string is passed to a function, the calling side
should already have set the memory aside. Therefore, functions such as
strcpy or ConvertToUpper are declared using character pointers as
parameters and the functions themselves do not set aside memory for the
strings. It is the responsibility of the code in the calling function to
make sure enough memory has been set aside for both parameters. For example, one could rewrite the declaration for "ConvertToUpper"
as:
In this second example, one assumes any actual parameters passed to
this function will represent enough memory for whatever string is to be
converted. F. String Input and Output
A string variable can be output in the same way. For example, to output
"s1" and "s2" in the "Strcnvrt.cpp" program above, we simply wrote:
This would have worked just as well if the string were declared as a
pointer:
Both these forms work because the << operator knows to keep
outputting characters until the \0 symbol is encountered. Another example
of the importance of this symbol! Input is a bit more complex. We have not said much about 'cin' yet and
we won't say much until the next chapter. What you do need to know for the
moment is that C++ programmers like to consider 'cin' as a stream
of data connecting a data source such as keyboard or file with a
program. For us, then, 'cin' is the connection, the stream, connecting the
keyboard with the program.
When you use ">>" operator to extract data from the 'cin' stream,
the system starts by skipping any leading blanks in the data coming down
the stream. Once a non-blank symbol is encountered, the system extracts
(reads) symbols until a blank is encountered. In entering data in earlier
programs, you usually hit the 'Enter' key after each data item. This works
because the system registers the 'Enter' key as if it were a blank.
Thus, if you have the code:
and you type the symbol '3', followed by the 'Enter' key, that '3'
enters the 'cin' stream' at the keyboard end and flows out at the program
end to be processed by the ">>" operator. The processing of the
symbol includes being translated into the integer value 3 because the
system sees that the destination is an integer variable. After the
translation, the result is stored in the variable "num1".
If you type a few blanks before typing the '3', those blanks are
skipped. If you type '312' followed by 'Enter', the system will read in
the symbols '3', '1', and '2' and translate them into the integer '312'.
If a user accidentally types some non-numeric characters, the extraction
operation fails because the system cannot translated the non-numeric
characters into integer values. (Chapter
11, Section IV discusses how to prevent infinite loops caused by
entering invalid data in code involving a 'while' loop)
With the code: the process is the same but this time the symbols are treated as
characters. The ">>" operator starts, as before, by skipping any
leading blanks in the data coming down the stream. Once it sees the first
non-blank character it starts passing characters into the string variable
until another blank is encountered. Then it adds the null character to
'myString' and stops. In other words, if the code is:
and you type:
the characters 'H' 'e' 'l' 'l' 'o' will be stored in "myString" along
with a '\0'. See the picture below:
Notice that:
These characters after "Hello" (including the blank) are actually still
in the stream waiting to be processed by the program's next request for
input. While this works, it does have two problems. First, if there are more
characters uninterrupted by a blank in the input stream than there are
elements in the string array, the ">>" operator will continue
placing characters passed the end of the array. (We have already talked
about the danger of overflowing an array.) Second, if we want a string variable to hold blanks, this method won't
work. For example, suppose we want one string variable to hold a
first and last name such as "Curtis Sollohub". If the user enters this
name with a blank between the first and last name, only the first name
will be saved in the string using the ">>" operator. So far all our input has come through the ">>" operator, but
there are a number of functions we can use to input data. The stream "cin"
is actually an instance of a standard C++ class called istream, and
in the <iostream.h> header file there is a declaration:
as there is a declaration:
One of the member functions associated with the class
"istream" is named get. This function has many forms (like the
constructors we have seen) and one of those forms can be used for string
input. There are two required and one optional parameter with this member
function. (Optional parameters use the default value mechanism
introduced in Chapter 10.)
The first required parameter is the string into which the data coming
off the stream should be read. The second is the maximum number of
characters to be read plus 1. (The "plus 1" is C++'s way of
reminding us that we need enough memory for the '\0'.) The optional, third parameter has the default value '\n', the newline
character. The purpose of the third parameter is to indicate under what
conditions the 'get' member function should stop extracting characters
from the stream. The 'get' function will continue extracting characters
from the stream until a character matching that of the third parameter is
encountered or until the maximum number of characters has been extracted.
If the default value is used, 'get' will extract characters until the
newline character (caused by the 'Enter' key being typed) is encountered
or until the maximum number of character is extracted.
Remember that a member function is called by writing a class instance,
followed by a period, followed by the function name. In our case, the
class instance is 'cin' and the member function is 'get'. Thus, the code
below:
will read up to 50 characters from the 'cin' stream and place them,
along with a '\0', in "myString". (Since there is no third parameter, the
default value of '\n' is used. What the 'get' function does is start
reading immediately from the stream ('cin' in this case) without
skipping leading blanks. As called, this function continues to read until
it has read in 50 characters or until it sees the default newline
character ('\n'). Thus, if you type "Curtis Sollohub" and then hit the
"Enter" key, the computer will store "Curtis Sollohub" in "myString". Note that if the user types some blanks before the word "Curtis", they
will be included in the string. Notice also that if the user types
"Curtis" and then hits the "Enter" key before typing "Sollohub, the string
will only hold the letters, It is also important to understand that the 'get' function does not
consume the newline character. It is still sitting on the stream. In other
words, if we again use the "get" function, it will read in nothing because
the first thing it will encounter is the newline character left over from
the last input - even if we have typed more characters at the keyboard. To
get past this, we can use a character-based version of the 'get' function
as in the following example:
name = new char[81]; // left over from punch card days
char* address; cout << "Please enter your name and hit the Enter
key.\n" The first call to 'get' will read in a name the user types. The line
"cin.get(dummy)" will read in the newline character and store it in the
variable 'dummy'. (The function needs a place to put the character read in
but the program doesn't do anything with it so we can call the variable
'dummy'.) The third call to the function 'get' will read in an
address. This code does not have a second:
at the bottom. For this code fragment it is not necessary, but, if the
program were longer and included other input, it would be safer to have
gotten rid of the second newline right away. As a review, remember that the compiler knows which version of the
overloaded function 'get' to use by looking at the number and type of
parameters involved in the call. In the file 'iostream.h', 'get' has been
declared a number of times - once with one parameter, a char type; and
once with three parameters - a char* (string) type, an int type, and a
char type. The third parameter has a default value of '\n' and therefore
can be skipped in a call. It's purpose is to indicate the character to
look for to stop reading in characters. See chapter 11 for more
information. Topics Covered in the "Essentials of C++" |
|