The Limits of Floating Point Numbers in C++

By Stephen R. Davis

Although floating-point variables in C++ can solve many calculation problems, such as truncation, they have some limitations themselves — the reverse of those associated with integer variables. Floating-point variables can’t be used to count things, are more difficult for the computer to handle, and also suffer from round-off error (though not nearly to the same degree as int variables).

Counting

You cannot use floating-point variables in applications where counting is important. This includes C++ constructs that count. C++ can’t verify which whole number value is meant by a given floating-point number.

For example, it’s clear to you that 1.0 is 1 but not so clear to C++. What about 0.9 or 1.1? Should these also be considered as 1? C++ simply avoids the problem by insisting on using int values when counting is involved.

Calculation speed

Historically, a computer processor can process integer arithmetic quicker than it can floating-point arithmetic. Thus, while a processor can add 1 million integer numbers in a given amount of time, the same processor may be able to perform only 200,000 floating-point calculations during the same period.

Calculation speed is becoming less of a problem as microprocessors get faster. In addition, today’s general-purpose microprocessors include special floating-point circuitry on board to increase the performance of these operations. However, arithmetic on integer values is just a heck of a lot easier and faster than performing the same operation on floating-point values.

Loss of accuracy

Floating-point float variables have a precision of about 6 digits, and an extra-economy size, double-strength version of float known as a double can handle about 13 significant digits. This can cause round-off problems as well.

Consider that 1⁄3 is expressed as 0.333 … in a continuing sequence. The concept of an infinite series makes sense in math but not to a computer because it has a finite accuracy. The FloatAverage program outputs 1.66667 as the average 1, 2, and 2 — that’s a lot better than the 0 output by the IntAverage version but not even close to an infinite sequence.

C++ can correct for round-off error in a lot of cases. For example, on output, C++ can sometimes determine that the user really meant 1 instead of 0.999999. In other cases, even C++ cannot correct for round-off error.

Not-so-limited range

Although the double data type has a range much larger than that of an integer, it’s still limited. The maximum value for an int is a skosh more than 2 billion. The maximum value of a double variable is roughly 10 to the 38th power. That’s 1 followed by 38 zeroes; it eats 2 billion for breakfast.

Only the first 13 digits or so of a double have any meaning; the remaining 25 digits are noise having succumbed to floating-point round-off error.