When you ask the R community for help, you’ll get the most useful advice if you know how to make a minimal reproducible example. A reproducible example is a sample of code and data that any other user can run and get the same results as you do. A minimal reproducible example is the smallest possible example that illustrates the problem; it consists of the following:

  • A small set of sample data

  • A short snippet of code that reproduces the error

  • The necessary information on your R version, the system it’s being run on, and the packages you’re using

If you want to know what a minimal reproducible example looks like, take a look at the examples in the R Help files. In general, all the code given in the R Help files fulfills the requirements of a minimal reproducible example.

Create sample data with random values

In most cases, you can use random data to illustrate a problem. R has some useful built-in functions to generate random numbers and other random data. For example, to make a vector of random numbers, use rnorm() for the normal distribution or runif() for a uniform distribution. To make a random vector with five elements, try the following:

> set.seed(1)
> x <- rnorm(5)
> x
[1] -0.6264538  0.1836433 -0.8356286  1.5952808  0.3295078

You can use the set.seed() function to specify a starting seed value for generating random numbers. By setting a seed value, you guarantee that the random numbers are the same each time you run the code. This may be pointless in production code, but it’s essential for a reproducible example. By setting a seed, you guarantee that your code will produce the same results as another person running your code.

If you want to generate random values of a predetermined set, use the sample() function. This function is a bit like dealing from a deck of playing cards. In a card game, you have 52 cards and you know exactly which cards are in the deck. But each deal will be different. You can simulate dealing a hand of seven cards using the following code:

> cards <- c(1:9, "J", "Q", "K", "A")
> suits <- c("Spades", "Diamonds", "Hearts", "Clubs")
> deck <- paste(rep(suits, each=13), cards)
> set.seed(123)
> sample(deck, 7)
[1] "Diamonds 2" "Clubs 2"    "Diamonds 8" "Clubs 5"
[5] "Clubs 7"    "Spades 3"   "Diamonds K"

By default, sample() uses each value only once. But sometimes you want elements of this section to appear multiple times. In this case, you can use the argument replace=TRUE. If you want to create a sample of size 12 consisting of the first three letters of the alphabet, you use the following:

> set.seed(5)
> sample(LETTERS[1:3], 12, replace=TRUE)
[1] "A" "C" "C" "A" "A" "C" "B" "C" "C" "A" "A" "B"

Creating a data.frame with sample data is straightforward:

> set.seed(42)
> dat <- data.frame(
+     x = sample(1:5),
+     y = sample(c("yes", "no"), 5, replace = TRUE)
+ )
> dat
  x   y
1 5  no
2 4  no
3 1 yes
4 2  no
5 3  no