Encoding Characters in C++ Code
Everything in the computer is represented by a pattern of ones and zeros — variations in voltage that are interpreted as numbers. Thus the bit pattern 0000 0001 is the number 1 when interpreted as an integer. However, this same bit pattern means something completely different when interpreted as an instruction by the processor.
So it should come as no surprise that the computer encodes the characters of the alphabet by assigning each a number.
Consider the character ‘A’. You could assign it any value you want as long as we all agree on the value. For example, you could assign a value of 1 to ‘A’, if you wanted to. Logically, you might then assign the value 2 to ‘B’, 3 to ‘C’, and so on.
In this scheme, ‘Z’ would get the value 26. You might then start over by assigning the value 27 to ‘a’, 28 to ‘b’, right down to 52 for ‘z’. That still leaves the digits ‘0’ through ‘9’ plus all the special symbols like space, period, comma, slash, semicolon, and the funny characters you see when you press the number keys while holding Shift down.
Add to that the unprintable characters such as tab and newline. When all is said and done, you could encode the entire English keyboard using numbers between 1 and 127.
Sometime around 1963, there was a general agreement on how characters should be encoded in English. The ASCII (American Standard Coding for Information Interchange) character encoding shown in was adopted pretty much universally except for one company.
IBM published its own standard in 1963 as well. The two encoding standards duked it out for about ten years, but by the early 1970s — when C and C++ were being created — ASCII had just about won the battle. The char type was created with ASCII character encoding in mind.
|1||Start of Heading||65||A|
|2||Start of Text||66||B|
|3||End of Text||67||C|
|4||End of Transmission||68||D|
|12||New Page; Form Feed||76||L|
|16||Data Link Escape||80||P|
|17||Device Control 1||81||Q|
|18||Device Control 2||82||R|
|19||Device Control 3||83||S|
|20||Device Control 4||84||T|
|23||End of Transmission||87||W|
|25||End of Medium||89||Y|
The first thing that you’ll notice is that the first 32 characters are the “unprintable” characters. That doesn’t mean that these characters are so naughty that the censor won’t allow them to be printed — it means that they don’t appear as visible symbols when printed on the printer (or on the console, for that matter). Many of these characters are no longer used or used only in obscure ways.
For example, character 25 “End of Medium” was probably printed as the last character before the end of a reel of magnetic tape. That was a big deal in 1963, but today . . . not so much, so use of the character is limited.
The characters starting with 32 are all printable with the exception of the last one, 127, which is the Delete character.