How R Calculates Infinite, Undefined, and Missing Values

Luckily, R can deal with data anomalies that confound some other statistical platforms. For instance, in some cases, you don’t have real values to calculate with. In most real-life data sets, in fact, at least a few values are missing. Also, some calculations have infinity as a result (such as dividing by zero) or can’t be carried out at all (such as taking the logarithm of a negative value).

How R defines infinity

To start exploring infinity in R, see what happens when you try to divide by zero:

> 2/0
[1] Inf

R correctly tells you the result is Inf, or infinity. Negative infinity is shown as -Inf. You can use Inf just as you use a real number in calculations:

> 4 - Inf
[1] -Inf

To check whether a value is finite, use the functions is.finite() and is.infinite(). The first function returns TRUE if the number is finite; the second one returns TRUE if the number is infinite.

R considers everything larger than the largest number a computer can hold to be infinity — on most machines, that’s approximately 1.8 × 10308. This definition of infinity can lead to unexpected results, as shown in the following example:

> is.finite(10^(305:310))
[1] TRUE TRUE TRUE TRUE FALSE FALSE

What does this line of code mean now? See whether you understand the nesting and vectorization in this example. If you break up the line starting from the inner parentheses, it becomes comprehensible:

  • You know already that 305:310 gives you a vector, containing the integers from 305 to 310.

  • All operators are vectorized, so 10^(305:310) gives you a vector with the results of 10 to the power of 305, 306, 307, 308, 309, and 310.

  • That vector is given as an argument to is.finite(). That function tells you that the two last results — 10^308 and 10^309 — are infinite for R.

How R deals with undefined outcomes

Your math teacher probably explained that if you divide any real number by infinity, you get zero. But what if you divide infinity by infinity?

> Inf / Inf
[1] NaN

Well, R tells you that the outcome is NaN. That result simply means Not a Number. This is R’s way of telling you that the outcome of that calculation is not defined.

The funny thing is that R actually considers NaN to be numeric, so you can use NaN in calculations. The outcome of those calculations is always NaN, though, as you see here:

> NaN + 4
[1] NaN

You can test whether a calculation results in NaN by using the is.nan() function. Note that both is.finite() and is.infinite() return FALSE when you’re testing on a NaN value.

How R copes with missing values

One of the most common problems in statistics is incomplete data sets. To deal with missing values, R uses the reserved keyword NA, which stands for Not Available. You can use NA as a valid value, so you can assign it as a value as well:

> x <- NA

You have to take into account, however, that calculations with a value of NA also generally return NA as a result:

> x + 4
[1] NA
> log(x)
[1] NA

If you want to test whether a value is NA, you can use the is.na() function, as follows:

> is.na(x)
[1] TRUE

Note that the is.na() function also returns TRUE if the value is NaN. The functions is.finite(), is.infinite(), and is.nan() return FALSE for NA values.

Function Inf –Inf NaN NA
is.finite() FALSE FALSE FALSE FALSE
is.infinite() TRUE TRUE FALSE FALSE
is.nan() FALSE FALSE TRUE FALSE
is.na() FALSE FALSE TRUE TRUE
  • Add a Comment
  • Print
  • Share
blog comments powered by Disqus
Advertisement

Inside Dummies.com

Dummies.com Sweepstakes

Win $500. Easy.