How to Manipulate Files in R - dummies

How to Manipulate Files in R

By Andrie de Vries, Joris Meys

Occasionally, you may want to write a script in R that will traverse a given folder and perform actions on all the data in the files or a subset of files in that folder.

To get a list of files in a specific folder, use list.files() or dir(). These two functions do exactly the same thing, but for backward-compatibility reasons, the same function has two names:

> list.files(file.path("F:", "git", "roxygen2"))
 [1] "roxygen2"      "roxygen2.Rcheck"
[3] "roxygen2_2.0.tar.gz" "roxygen2_2.1.tar.gz"
Function Description
Iist.files Lists files in a directory.
list.dirs Lists subdirectories of a directory.
file.exists Tests whether a specific file exists in a location.
file.create Creates a file.
file.remove Deletes files (and directories in Unix operating systems).
tempfile Returns a name for a temporary file. If you create a file
— for example, with file.create()
or write.table() using this returned name
— R will create a file in a temporary folder.
tempdir Returns the file path of a temporary folder on your file

Next, you get to exercise all your knowledge about working with files. In the next example, you first create a temporary file, then save a copy of the iris data frame to this file. To test that the file is on disk, you then read the newly created file to a new variable and inspect this variable. Finally, you delete the temporary file from disk.

Start by using the tempfile() function to return a name to a character string with the name of a file in a temporary folder on your system:

> my.file <- tempfile()
> my.file
[1] "C:\Users\Andrie\AppData\Local\Temp\ RtmpGYeLTj\file14d4366b6095"

Notice that the result is purely a character string, not a file. This file doesn’t yet exist anywhere. Next, you save a copy of the data frame iris to my.file using the write.csv() function. Then use list.files() to see if R created the file:

> write.csv(iris, file=my.file)
> list.files(tempdir())
[1] "file14d4366b6095"

As you can see, R created the file. Now you can use read.csv() to import the data to a new variable called file.iris:

> file.iris <- read.csv(my.file)

Use str() to investigate the structure of file.iris. As expected file.iris is a data.frame of 150 observations and six variables. Six variables, you say? Yes, six, although the original iris only has five columns.

What happened here was that the default value of the argument row.names of read.csv() is row.names=TRUE. (You can confirm this by taking a close look at the Help for ?read.csv().) So, R saved the original row names of iris to a new column called X:

> str(file.iris)
'data.frame':               150 obs. of 6 variables:
 $ X      : int 1 2 3 4 5 6 7 8 9 10 ...
 $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
 $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
 $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
 $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
 $ Species   : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

To leave your file system in its original order, you can use file.remove() to delete the temporary file:

> file.remove(my.file)
> list.files(tempdir())

As you can see, the result of list.files() is an empty character string, because the file no longer exists in that folder.