How to Distinguish Data Types in R - dummies

How to Distinguish Data Types in R

By Andrie de Vries, Joris Meys

In the field of statistics, being able to distinguish between variables of different types is very important. The type of data very often determines the type of analysis that can be performed. As a result, R offers the ability to explicitly classify data as follows:

  • Nominal data: This type of data, which you represent in R using factors, distinguishes between different categories, but there is no implied order between categories. Examples of nominal data are colors (red, green, blue), gender (male, female), and nationality (British, French, Japanese).

  • Ordinal data: Ordinal data is distinguished by the fact that there is some kind of natural order between elements but no indication of the relative size difference. Any kind of data that is possible to rank in order but not give exact values to is ordinal. For example, low < medium < high describes data that is ordered with three levels.

    In market research, it’s very common to use a five-point scale to measure perceptions: strongly disagree < disagree < neutral < agree < strongly agree. This is also an example of ordinal data.

    Another example is the use of the names of colors to indicate order, such as red < amber < green to indicate project status.

    In R, you use ordered factors to describe ordinal data.

  • Numeric data: You have numeric data when you can describe your data with numbers (for example, length, weight, or count). Numeric data has two subcategories.

    • Interval scaled data: You have interval scaled data when the interval between adjacent units of measurement is the same, but the zero point is arbitrary. An everyday example of interval scaled data is our calendar system. Each year has the same length, but the zero point is arbitrary. In other words, time didn’t start in the year zero —simply use a convenient year to start counting. This means you can add and subtract dates (and all other types of interval scaled data), but you can’t meaningfully divide dates. Other examples include longitude, as well as anything else where there can be disagreement about where the starting point is.

      Other examples of interval scaled data can be found in social science research such as market research.

      In R you can use integer or numeric objects to represent interval scaled data.

    • Ratio scaled data: This is data where all kinds of mathematical operations are allowed, in particular the ability to multiply and divide (in other words, take ratios). Most data in physical sciences are ratio scaled — for example, length, mass, and speed. In R, you use numeric objects to represent ratio scaled data.