By Stephen R. Davis

Everything in the computer is represented by a pattern of ones and zeros — variations in voltage that are interpreted as numbers. Thus the bit pattern 0000 0001 is the number 1 when interpreted as an integer. However, this same bit pattern means something completely different when interpreted as an instruction by the processor.

So it should come as no surprise that the computer encodes the characters of the alphabet by assigning each a number.

Consider the character ‘A’. You could assign it any value you want as long as we all agree on the value. For example, you could assign a value of 1 to ‘A’, if you wanted to. Logically, you might then assign the value 2 to ‘B’, 3 to ‘C’, and so on.

In this scheme, ‘Z’ would get the value 26. You might then start over by assigning the value 27 to ‘a’, 28 to ‘b’, right down to 52 for ‘z’. That still leaves the digits ‘0’ through ‘9’ plus all the special symbols like space, period, comma, slash, semicolon, and the funny characters you see when you press the number keys while holding Shift down.

Add to that the unprintable characters such as tab and newline. When all is said and done, you could encode the entire English keyboard using numbers between 1 and 127.

Sometime around 1963, there was a general agreement on how characters should be encoded in English. The ASCII (American Standard Coding for Information Interchange) character encoding shown in was adopted pretty much universally except for one company.

IBM published its own standard in 1963 as well. The two encoding standards duked it out for about ten years, but by the early 1970s — when C and C++ were being created — ASCII had just about won the battle. The char type was created with ASCII character encoding in mind.

The ASCII Character Set
Value Char Value Char
0 NULL 64 @
1 Start of Heading 65 A
2 Start of Text 66 B
3 End of Text 67 C
4 End of Transmission 68 D
5 Enquiry 69 E
6 Acknowledge 70 F
7 Bell 71 G
8 Backspace 72 H
9 Tab 73 I
10 Newline 74 J
11 Vertical Tab 75 K
12 New Page; Form Feed 76 L
13 Carriage Return 77 M
14 Shift Out 78 N
15 Shift In 79 O
16 Data Link Escape 80 P
17 Device Control 1 81 Q
18 Device Control 2 82 R
19 Device Control 3 83 S
20 Device Control 4 84 T
21 Negative Acknowledge 85 U
22 Synchronous Idle 86 V
23 End of Transmission 87 W
24 Cancel 88 X
25 End of Medium 89 Y
26 Substitute 90 Z
27 Escape 91 [
28 File Separator 92
29 Group Separator 93 ]
30 Record Separator 94 ^
31 Unit Separator 95 _
32 Space 96 `
33 ! 97 a
34 98 b
35 # 99 c
36 $ 100 d
37 % 101 e
38 & 102 f
39 103 g
40 ( 104 h
41 ) 105 i
42 * 106 j
43 + 107 k
44 , 108 l
45 = 109 m
46 . 110 n
47 / 111 o
48 0 112 p
49 1 113 q
50 2 114 r
51 3 115 s
52 4 116 t
53 5 117 u
54 6 118 v
55 7 119 w
56 8 120 x
57 9 121 y
58 : 122 z
59 ; 123 {
60 < 124 |
61 = 125 }
62 > 126 ~
63 ? 127 DEL

The first thing that you’ll notice is that the first 32 characters are the “unprintable” characters. That doesn’t mean that these characters are so naughty that the censor won’t allow them to be printed — it means that they don’t appear as visible symbols when printed on the printer (or on the console, for that matter). Many of these characters are no longer used or used only in obscure ways.

For example, character 25 “End of Medium” was probably printed as the last character before the end of a reel of magnetic tape. That was a big deal in 1963, but today . . . not so much, so use of the character is limited.

The characters starting with 32 are all printable with the exception of the last one, 127, which is the Delete character.