By Stephen R. Davis

Floating-point variables in C++ come with their own limitations. They cannot be used to count things, they take longer to process, they consume more memory, and they also suffer from round-off error (though not nearly as bad as int). Now to consider each one of these problems in turn.

Counting with double

You can’t use a floating-point variable in an application where counting is important. In C++, you can’t say that there are 7.0 characters in my first name. Operators involved in counting don’t work on floating-point variables. In particular, the auto-increment (++) and auto-decrement (- -) operators are strictly verboten on double.

Calculation speed of double

Computers can perform integer arithmetic faster than they can do floating-point arithmetic. Fortunately, floating-point processors have been built into CPUs for many years now, so the difference in performance is not nearly as significant as it once was. The following loop was written just as a simple example, first using integer arithmetic:

int nValue1 = 1, nValue2 = 2, nValue3 = 2;
for (int i = 0; i < 1000000000; i++)
{
    int nAverage = (nValue1 + nValue2 + nValue3) / 3;
}

This loop took about 5 seconds to execute. Execute the same loop in floating-point arithmetic:

double dValue1 = 1, dValue2 = 2, dValue3 = 2;
for (int i = 0; i < 1000000000; i++)
{
    double dAverage = (dValue1 + dValue2 + dValue3) / 3.0;
}

This look took about 21 seconds to execute. Calculating an average 1 billion times in a little over 20 seconds ain’t shabby, but it’s still four times slower than the processing time for its integer equivalent.

The double variable consumes more memory

On a PC or Macintosh, an int consumes 4 bytes, whereas a double takes up 8 bytes. That doesn’t sound like much — and, in fact, it isn’t — but if you had a few million of these things to keep in memory . . . well, it still wouldn’t be much. But if you had a few hundred million, then the difference would be considerable.

This is another way of saying that unless you need to store a heck of a lot of objects, don’t worry about the difference in memory taken by one type versus another. Instead, pick the variable type that meets your needs.

If you do just happen to be programming an application that needs (say) to manipulate the age of every human being on the planet at the same time, then you may want to lean toward the smaller int because it consumes less memory. (Do you do that sort of thing often?)

Loss of accuracy with double

A double variable has about 16 significant digits of accuracy. Consider that a mathematician would express the number 1/3 as 0.333. . ., where the ellipses indicate that the threes go on forever. The concept of an infinite series makes sense in mathematics, but not in computing.

A computer only has a finite amount of memory and a finite amount of accuracy. Therefore it has to round off, which results in a tiny (but real) error.

C++ can correct for round-off error in a lot of cases. For example, on output if a variable is 0.99999999999999, C++ will just assume that it’s really 1.0 and display it accordingly. However, C++ can’t correct for all floating-point round-off errors, so you need to be careful. For example, you can’t be sure that 1/3 + 1/3 + 1/3 is equal to 1.0:

double d1 = 23.0;
double d2 = d1 / 7.0;
if (d1 == (d2 * 7.0))
{
    cout << "Did we get here?" << endl;
}

You might think that this code snippet would always display the “Did we get here?” string, but surprisingly it does not. The problem is that 23 divided by 7 cannot be expressed exactly in a floating-point number. Something is lost. Thus d2 * 7 is very close to 23, but is not exactly equal.

Rather than looking for exact equality between two floating-point numbers, you should be asking, “Is d2 * 7 vanishingly close to d1 in value?” You can do that as follows:

double d1 = 23.0;
double d2 = d1 / 7.0;
// Is d2 * 7.0 within delta of d1?
double difference = (d2 * 7.0) - d1;
double delta = 0.00001;
if (difference < delta && difference > -delta)
{
    cout << "Did we get here?" << endl;
}

This code snippet calculates the difference between d1 and d2 * 7.0. If this difference is less than some small delta, the code calls it a day and says that d1 and d2 * 7 are essentially equal.

Not-so-limited range of double

The largest number that a double can store is roughly 10 to the 38th power. That’s a 1 with 38 zeroes after it; that eats the puny 2 billion maximum size for an int for breakfast. That’s even more than the national debt (at least, at the time of this writing). There are probably applications where 38 zeroes aren’t enough.

Remember that only the first 16 digits are significant. The remaining 22 digits are noise, having already succumbed to standard floating-point round-off error.