|
Published:
July 7, 2015

R For Dummies

Overview

Mastering R has never been easier

Picking up R can be tough, even for seasoned statisticians and data analysts. R For Dummies, 2nd Edition provides a quick and painless way to master all the R you'll ever need. Requiring no prior programming experience and packed with tons of practical examples, step-by-step exercises, and sample code, this friendly and accessible guide shows you how to know your way around lists, data frames, and other R data structures, while learning to interact with other programs, such as Microsoft Excel. You'll learn how to reshape and manipulate data, merge data sets, split and combine data, perform calculations on vectors and arrays, and so much more.

R is an open source statistical environment and programming language that has become very popular in varied fields for the

management and analysis of data. R provides a wide array of statistical and graphical techniques, and has become the standard among statisticians for software development and data analysis. R For Dummies, 2nd Edition takes the intimidation out of working with R and arms you with the knowledge and know-how to master the programming language of choice among statisticians and data analysts worldwide.

  • Covers downloading, installing, and configuring R
  • Includes tips for getting data in and out of R
  • Offers advice on fitting regression models and ANOVA
  • Provides helpful hints for working with graphics

R For Dummies, 2nd Edition is an ideal introduction to R for complete beginners, as well as an excellent technical reference for experienced R programmers.

Read More

About The Author

Andrie de Vries is a leading R expert and Business Services Director for Revolution Analytics. With over 20 years of experience, he provides consulting and training services in the use of R. Joris Meys is a statistician, R programmer and R lecturer with the faculty of Bio-Engineering at the University of Ghent.

Sample Chapters

r for dummies

CHEAT SHEET

R is more than just a statistical programming language. It’s also a powerful tool for all kinds of data processing and manipulation, used by a community of programmers and users, academics, and practitioners.To get the most out of R, you need to know how to access the R Help files and find help from other sources.

HAVE THIS BOOK?

Articles from
the book

In addition to R For Dummies, there are many online resources for the aspiring or experienced R programmer. They'll help you find answers or learn valuable tips and tricks. (Or, they're just fun!) Cookbook for R The goal of the Cookbook for R is to provide solutions to common tasks and problems in analyzing data.
The spreadsheet is probably one of the most widely used PC applications — and for good reason: Spreadsheets make it very easy to perform calculations and other operations on tabular data. But spreadsheets pose some risks as well: They’re easy to corrupt and very difficult to debug. The good news is, you can use R to do many of the same things you used to do in spreadsheets.
One of the very attractive features of R is that it contains a large collection of third-party packages (collections of functions in a well-defined format). To get the most out of R, you need to understand where to find additional packages, how to download and install them, and how to use them. Poking around the nooks and crannies of CRAN The Comprehensive R Archive Network (CRAN) is a network of web servers around the world where you can find the R source code, R manuals and documentation, and contributed packages.
Every object you create in R ends up in this environment, which is called the global environment. The global environment is the universe of the R user where everything happens. R gurus will tell you that this “universe” is actually contained in another “universe” and that one in yet another, and so on — but that “outer space” is a hostile environment suited only to daring coders without fear of breaking things.
Vectors have a structure and a type, and R is a bit sensitive about both. Feeding R the wrong type of vector is like trying to make your cat eat dog food: Something will happen, and chances are that it won’t be what you hoped for. So, you’d better know what type of vector you have. Looking at the structure of a vector R gives you an easy way to look at the structure of any object.
Even with good introductory books on R, you'll need to use the R Help files. The R Help files provide detailed information about the use of different functions and their peculiarities. R has excellent built-in help for every function that explains how to use that function. Just about every Help page has some examples that demonstrate how to use that function.
In addition to data, geoms, and stats, the full specification of a ggplot2 in R includes facets and scales. Facets allow you to visualize different subsets of your data in a single plot. Scales include not only the x-axis and y-axis, but also any additional keys that explain your data (for example, when different subgroups have different colors in your plot).
As time goes by, new data may appear and needs to be added to the dataset in R. Just like matrices, data frames can be appended using the rbind() function. Adding a single observation Say that Granny and Geraldine played another game with their team, and you want to add the number of baskets they made. The rbind() function lets you do that easily: > result <- rbind(baskets.
You can create trend lines or regression lines through data. These can be useful when using lattice plots in R. When you tell lattice to calculate a line of best fit, it does so for each panel in the plot. This is straightforward using xyplot(), because it’s as simple as adding a type argument. In particular, you want to specify that the type is both points (type = “p”) and regression (type = “r”).
A data frame can be extended with new variables in R. You may, for example, get data from another player on Granny’s team. Or you may want to calculate a new variable from the other variables in the dataset, like the total sum of baskets made in each game. Adding a single variable There are three main ways of adding a variable.
Whenever you have a limited number of different values in R, you can get a quick summary of the data by calculating a frequency table. A frequency table is a table that represents the number of occurrences of every unique value in the variable. In R, you use the table() function for that. Creating a table in R You can tabulate, for example, the amount of cars with a manual and an automatic gearbox using the following command: > amtable <- table(cars$am) > amtable auto manual 13 19 This outcome tells you that your data contains 13 cars with an automatic gearbox and 19 with a manual gearbox.
Much like many other objects that you will encounter in R, lists aren’t static objects. You can change components, add components, and remove components from them in a pretty straightforward manner. Changing the value of components Assigning a new value to a component in a list is pretty straightforward. You use either the $ or the [[ ]] to access that component, and simply assign a new value.
In addition to the mean and variation, you also can take a look at the quantiles in R. A quantile, or percentile, tells you how much of your data lies below a certain value. The 50 percent quantile, for example, is the same as the median. Again, R has some convenient functions to help you with looking at the quantiles.
To dive a bit deeper into how you can use vectors in R, let’s consider this All-Star Grannies example. You have two vectors that contain the number of baskets that Granny and her friend Geraldine scored in the six games of this basketball season: > baskets.of.Granny <- c(12, 4, 4, 6, 9, 3) > baskets.of.Geraldine <- c(5, 3, 2, 2, 12, 9) The c() function stands for combine.
When you are trying to create tables from a matrix in R, you end up with trial.table. The object trial.table looks exactly the same as the matrix trial, but it really isn’t. The difference becomes clear when you transform these objects to a data frame. Take a look at the outcome of this code: > trial.df <- as.
In the R programming language, a conversion from a matrix to a data frame can’t be used to construct a data frame with different types of values. If you combine both numeric and character data in a matrix, for example, everything will be converted to character.You can construct a data frame from scratch, though, using the data.
You can create a data frame from a matrix in R. Take a look at the number of baskets scored by Granny and her friend Geraldine. If you create a matrix baskets.team with the number of baskets for both ladies, you get this: > baskets.team [,1] [,2] [,3] [,4] [,5] [,6] baskets.of.Granny 12 4 5 6 9 3 baskets.
It shouldn’t come as a surprise that you create a list in R with the list() function. You can use the list() function in two ways: to create an unnamed list or to create a named list. The difference is small; in both cases, think of a list as a big box filled with a set of bags containing all kinds of different stuff.
The first element of a ggplot2 layer is the data. There is only one rule in R for supplying data to ggplot(): Your data must be in the form of a data frame. This is different from base graphics, which allow plotting of data in vectors, matrices, and other structures. You can use the built-in dataset quakes. This dataset is a data frame with information about earthquakes near Fiji.
Base R has a function, reshape(), that works fine for data reshaping. However, the original author of this function had in mind a specific use case for reshaping: so-called longitudinal data. Longitudinal research takes repeated observations of a research subject over a period of time. For this reason, longitudinal data typically has the variables associated with time.
You can extract components from lists in R. Consider two lists. The display of both the unnamed list baskets.list and the named list baskets.nlist show already that the way to access components in a list is a little different. That’s not completely true, though. In the case of a named list, you can access the components using the $, as you do with data frames.
Many tests that you run in R return an htest object. That type of object is basically a list with all the information about the test that has been carried out. All these htest objects contain at least a component statistic with the value of the statistic and a component p.value with the value of the p-value. You can see this easily if you look at the structure of the returned object.
In many cases, you can extract values from a data frame in R by pretending that it’s a matrix. But although data frames may look like matrices, they definitely are not. Unlike matrices and arrays, data frames are not internally stored as vectors but as lists of vectors. Pretending it’s a matrix If you want to extract values from a data frame, you can just pretend it’s a matrix and start from there.
If you’re just getting started with R, you’ve probably used only functions that are available in the basic installation of R. But the real power of R lies in the fact that anyone can write their own functions and share them with other R users in an organized manner. Many knowledgeable people have written convenient functions with R, and often a new statistical method is published together with R code.
A ggplot2 geom tells the plot how you want to display your data in R. For example, you use geom_bar() to make a bar chart. In ggplot2, you can use a variety of predefined geoms to make standard types of plot. A geom defines the layout of a ggplot2 layer. For example, you can use geoms to create bar charts, scatterplots, and line diagrams (as well as a variety of other plots), as you can see below.
In some cases, you don’t have real values to calculate with. In most real-life data sets in R, in fact, at least a few values are missing. Also, some calculations have infinity as a result (such as dividing by zero) or can’t be carried out at all (such as taking the logarithm of a negative value). Luckily, R can deal with all these situations.
After you’ve told ggplot() what data to use in R, the next step is to tell it how your data corresponds to visual elements of your plot. This mapping between data and visual aesthetics is the second element of a ggplot2 layer. The visual elements of a plot, or aesthetics, include lines, points, symbols, colors, position .
Going from a script to a function doesn’t take much effort at all. In R, a function is essentially a piece of code that is executed consecutively and without interruption. In that way, a function doesn’t differ that much from a script run using the source() function. However, a function has two very nice advantages over scripts: Functions can work with variable input, so you use it with different data.
The rbind() function in the R programming language conveniently adds the names of the vectors to the rows of the matrix. You name the values in a vector, and you can do something very similar with rows and columns in a matrix.For that, you have the functions rownames() and colnames(). Guess which one does what?
Many people who start with R get confused by lists in the beginning. There’s really no need for that — a list has only two important parts: the components and the names. And in the case of unnamed lists, you don’t even have to worry about the latter. But if you look at the structure of baskets.list in the following output, you can see why people often shy away from lists.
Each time, you combine a vector with multiple values and one with a single value in a function. R applies the function, using that single value for every value in the vector. But recycling goes far beyond these examples. Any time you give two vectors with unequal lengths to a recycling function, R repeats the shortest vector as often as necessary to carry out the task you asked it to perform.
Like any programming language, R makes it easy to compile lists of sorted and ordered data. To find substrings, you can use the grep() function, which takes two essential arguments: pattern: The pattern you want to find. x: The character vector you want to search. Suppose you want to find all the states that contain the pattern New.
You probably are itching to get started on some real R code. Here, you get to do exactly that. Get ready to get your hands dirty and dive into the programming world! Saying hello to the world Programming books typically start with a very simple program. Often, this first program creates the message “Hello world!
After data, mapping, and geoms, the fourth element of a ggplot2 layer in R describes how the data should be summarized. In ggplot2, you refer to this statistical summary as a stat. One very convenient feature of ggplot2 is its range of functions to summarize your data in the plot. This means that you often don’t have to pre-summarize your data.
R has many functions that allow you to import data from other applications. The following table lists some of the useful text import functions, what they do, and examples of how to use them. Function What It Does Example read.table() Reads any tabular data where the columns are separated (for example by commas or tabs).
One important difference between a matrix and a data frame in R is that data frames always have named observations. Whereas the rownames() function returns NULL if you didn’t specify the row names of a matrix, it will always give a result in the case of a data frame. Check the outcome of the following code: > rownames(employ.
R is more than just a statistical programming language. It’s also a powerful tool for all kinds of data processing and manipulation, used by a community of programmers and users, academics, and practitioners.To get the most out of R, you need to know how to access the R Help files and find help from other sources.
Vectors, lists, and data frames play an important role in representing data in R, so being able to succinctly and correctly specify a subset of your data is important. There are three main operators that you can use to subset your data: $: Extracts a single element by name from a list or data frame. For example, iris$Sepal.
Once you have a really good grip on using date and time, you may want to explore additional functionality available in R and add-on packages by looking at the following: chron: R has the simpler chron class for datetime objects that don’t have a time zone. To investigate this class, first load the chron package with library(“chron”) and then read the Help file ?
When talking about reshaping data in R, it’s important to recognize data in long and wide formats. These visual metaphors describe two ways of representing the same information. It’s helpful to know these formats when using R. You can recognize data in wide format by the fact that columns generally represent groups.
R is more than just a domain-specific programming language aimed at data analysis. It has some unique features that make it very powerful, the most important one arguably being the notion of vectors. These vectors allow you to perform sometimes complex operations on a set of values in a single command. Performing multiple calculations with vectors R is a vector-based language.
https://cdn.prod.website-files.com/6630d85d73068bc09c7c436c/69195ee32d5c606051d9f433_4.%20All%20For%20You.mp3

Frequently Asked Questions

No items found.