How to Read Data from Excel into R - dummies

How to Read Data from Excel into R

By Andrie de Vries, Joris Meys

If you ask users of R what the best way is to import data directly from Microsoft Excel, most of them will probably answer that your best option is to first export from Excel to a CSV file and then use read.csv() to import your data to R.

In fact, this is still the advice in Chapter 8 of the R import and export manual, which says, “The first piece of advice is to avoid doing so if possible!” The reason is that many of the existing methods for importing data from Excel depend on third-party software or libraries that may be difficult to configure, not available on all operating systems, or perhaps have restrictive licensing terms.

However, since February 2011 there exists a new alternative: using the package XLConnect, available from CRAN. What makes XLConnect different is that it uses a Java library to read and write Excel files. This has two advantages:

  • It runs on all operating systems that support Java. XLConnect is written in Java and runs on Window, Linux, and Mac OS.

  • There’s nothing else to load. XLConnect doesn’t require any other libraries or software. If you have Java installed, it should work.

XLConnect also can write Excel files, including changing cell formatting, in both Excel 97–2003 and Excel 2007/10 formats.

To find out more about XLConnect, you can read the excellent package vignette.

By now you’re probably itching to get started with an example. Assume you want to read an Excel spreadsheet in your user directory called Elements.xlsx. First, install and load the package; then create an object with the filename:

> install.packages("XLConnect")
> library("XLConnect")
> excel.file <- file.path("~/Elements.xlsx")

Now you’re ready to read a sheet of this workbook with the readWorksheetFromFile() function. You need to pass it at least two arguments:

  • file: A character string with a path to a valid .xls or .xlsx file

  • sheet: Either an integer indicating the position of the worksheet (for example, sheet=1) or the name of the worksheet (for example, sheet=”Sheet2″)

The following two lines do exactly the same thing — they both import the data in the first worksheet (called Sheet1):

> elements <- readWorksheetFromFile(excel.file, sheet=1)
> elements <- readWorksheetFromFile(excel.file, sheet="Sheet1")