|
Published:
February 13, 2018

R Projects For Dummies

Overview

Make the most of R’s extensive toolset

R Projects For Dummies offers a unique learn-by-doing approach. You will increase the depth and breadth of your R skillset by completing a wide variety of projects. By using R’s graphics, interactive, and machine learning tools, you’ll learn to apply R’s extensive capabilities in an array of scenarios. The depth of the project experience is unmatched by any other content online or in print. And you just might increase your statistics knowledge along the way, too!

R is a free tool, and it’s the basis of a huge amount of work in data science. It's taking the place of costly statistical software that sometimes takes a long time to learn. One reason is that you can use just a

few R commands to create sophisticated analyses. Another is that easy-to-learn R graphics enable you make the results of those analyses available to a wide audience.

This book will help you sharpen your skills by applying them in the context of projects with R, including dashboards, image processing, data reduction, mapping, and more.

  • Appropriate for R users at all levels
  • Helps R programmers plan and complete their own projects
  • Focuses on R functions and packages
  • Shows how to carry out complex analyses by just entering a few commands

If you’re brand new to R or just want to brush up on your skills, R Projects For Dummies will help you complete your projects with ease.

Read More

About The Author

Joseph Schmuller, PhD, is a veteran of more than 25 years in Information Technology. He is the author of several books, including Statistical Analysis with R For Dummies and four editions of Statistical Analysis with Excel For Dummies. In addition, he has written numerous articles and created online coursework for Lynda.com.

Sample Chapters

r projects for dummies

CHEAT SHEET

To complete any project using R, you work with functions that live in packages designed for specific areas. This cheat sheet provides some information about these functions.Interacting with users with R functionsHere’s a selection of statistical functions that come with the standard R installation. You’ll find many others in R packages.

HAVE THIS BOOK?

Articles from
the book

Here, you learn about books and websites that help you learn more about R programming. Without further ado. . . Interacting with users If you want to delve deeper into R applications that interact with users, start with this tutorial by shiny guiding force Garrett Grolemund.For a helpful book on the subject, consider Chris Beeley’s web Application Development with R Using Shiny, 2nd Edition (Packt Publishing, 2016).
SVMs work well when you have to use R to classify individuals on the basis of many features — usually, way more than in the iris data frame. Here, you learn how to create an SVM that identifies the party affiliations of members of the 1984 U.S. House of Representatives. The target variable is whether the congressperson is a Republican or a Democrat, based on their votes on 16 issues of that time.
Discovering exactly how the neurons process inputs and send messages has sometimes been the basis for winning the Nobel prize. Now, take a look at artificial neural networks to understand how machine learning works in R programming. Overview An ML neural network consists of simulated neurons, often called units, or nodes, that work with data.
R has a package that uses recursive partitioning to construct decision trees. It’s called rpart, and its function for constructing trees is called rpart(). To install the rpart package, click Install on the Packages tab and type rpart in the Install Packages dialog box. Then, in the dialog box, click the Install button.
You’ve probably seen bar plots where each point on the x-axis has more than one bar. The image below shows an example. The bar plot shows the frequency of eye color for four hair colors in 313 female students. The data is from the HairEyeColor data set. This type of plot is called a grouped bar plot. Grouped bar plot of Eye Color and Hair Color in 313 female students.
How do you create a forest out of a dataset in R? Well, randomly. Here's what this means. You can create a decision tree from a dataset. It’s possible to use the rattle package to partition a data frame into a training set, a validation set, and a test set. The partitioning takes place as a result of random sampling from the rows in the data frame.
To introduce k-means clustering for R programming, you start by working with the iris data frame. This is the iris data frame that’s in the base R installation. Fifty flowers in each of three iris species (setosa, versicolor, and virginica) make up the data set. The data frame columns are Sepal.Length, Sepal.Width, Petal.
In ggplot2, Wickham’s implementation of Wilkinson’s grammar is an easy-to-learn structure for R graphics code.A graph starts with the function ggplot(), which takes two arguments. The first argument is the source of the data. The second argument maps the data components of interest into components of the graph.
Many (but not all) of the UCI datasets you will use in R programming are in comma-separated value (CSV) format: The data are in text files with a comma between successive values. A typical line in this kind of file looks like this:5.1,3.5,1.4,0.2,Iris-setosaThis is the first line from a well-known dataset called iris.
Here's a selection of statistical functions that come with the standard R installation. You'll find many others in R packages. R provides the shiny package and the shinydashboard package for developing interactive applications. Here are selected functions from these packages: Central Tendency and Variability Function What it Calculates mean(x) Mean of the numbers in vector x.
The Log tab shows your interactions with Rattle as R code. Here’s a good example of working with the Rattle log.In the hierarchical clustering analysis, click on Data Plot. You see a plot that looks very much like this. Data plot.To find the code that produced this plot, select the Log tab and scroll down until you find this:plot(crs$dataset[, c(1:4)>, col=cutree(crs$hclust,3))Copy and paste that line into the RStudio Script panel and then press Ctrl+R to run it.
Here are some packages and functions to help you get started using R to draw maps and to process images.Packages and Functions for Plotting Maps and for Processing Images Package Function What it does maps map_data() Returns a data frame of latitudes and longitudes ggmaps geocode() Returns latitude and
Rattle is a terrific teaching tool for R programming. In this little two-part project, you can use Rattle to help wrap your brain around the complexity parameter (cp) and what it entails.The default value of the cp is .01. To tell you how to calculate cp is beyond the scope of our discussion here. Just think of cp as the “minimum benefit” that a split must add to the tree.
Rattle provides a GUI to R’s tree-construction and tree-plotting functions. To use this GUI to create a decision tree for iris.uci, begin by opening Rattle:library(rattle) rattle()The information here assumes that you’ve downloaded and cleaned up the iris dataset from the UCI ML Repository and called it iris.uci.
R formulas are useful for multiple reasons. Suppose you’re interested in how the temperature varies with the month. Having lived through many Mays through Septembers in one place, you might guess is that the temperature generally increases in this data frame from month to month. Is that the case?This gets into the area of statistical analysis, and at a fairly esoteric level.
A dataset that’s often used to illustrate ML concepts in R programming is the information about passengers on the Titanic’s disastrous voyage in 1912. The target variable is whether the passenger survived. You can use this data to create a decision tree.The data resides in an R package called titanic. If it’s not already on the Packages tab, click Install.
One benefit of Rattle is that it allows you to easily experiment with whatever it helps you create with R. Here’s a little project for you to try. You’ll learn more about neural networks if you can see how the network error rate decreases with the number of iterations through the training set.So the objective is to plot the error rate for the banknote.
If you’re interested in trying out your RFM analysis skills on another set of data, this R project is for you. The CDNOW data set consists of almost 70,000 rows. It’s a record of sales at CDNOW from the beginning of January 1997 through the end of June 1998.Press Ctrl+A to highlight all the data, and press Ctrl+C to copy to the clipboard.
If you’ve been working with images, animated images, and combined stationary images in R, it may be time to take the next step. This project walks you through the next step: Combine an image with an animated image.This image shows the end product — the plot of the iris data set with comedy icons Laurel and Hardy positioned in front of the plot legend.
Try out this R project to see how one variable might affect an outcome. It’s conceivable that weather conditions could influence flight delays. How do you incorporate weather information into the assessment of delay?One nycflights13 data frame called weather provides the weather data for every day and hour at each of the three origin airports.
Give this project a try to test out your R skills. If you’re the outdoorsy type, you probably encounter mushrooms growing in the wild. As you might know, some mushrooms are edible, and others are most definitely not(!)The UCI ML repository has a dataset of mushrooms with lots and lots of instances (8,124 of them) and 22 attributes.
To complete any project using R, you work with functions that live in packages designed for specific areas. This cheat sheet provides some information about these functions.Interacting with users with R functionsHere’s a selection of statistical functions that come with the standard R installation. You’ll find many others in R packages.
How many data sets are perfectly linearly separable, like set.vers? R programmers know the answer: not many. In fact, here’s vers.virg, the two-thirds of the irises that aren’t setosa:vers.virg <- subset(iris, Species !="setosa")This image shows the plot of Petal.Width versus Petal.Length for this data frame. You can clearly see the slight overlap between species, and the resulting nonlinear separability.
Machine Learning (ML) is a popular area. R provides a number of ML-related packages and functions. Here are some of them:Machine Learning Packages and Functions Package Function What it does rattle rattle() Opens the Rattle Graphic User Interface rpart rpart() Creates a decision tree rpart.plot prp()
You can use Rattle with iris for your R projects. Let’s say you downloaded the iris data set from the UCI ML Repository, cleaned it up a bit, and explored it. Then you installed Rattle. Now it’s time to put Rattle to work. Getting and (further) exploring the data The first thing to do is bring the dataset into Rattle.
R has numerous functions and packages that deal with ML. Data science honcho Graham Williams has created Rattle, a graphical user interface (GUI) to many of these functions. You can use Rattle for certain ML projects.Much of what Rattle does depends on a package called RGtk2, which uses R functions to access the Gnu Image Manipulation Program (GIMP) toolkit.
Created for statistical analysis, R has a wide array of packages and functions for dealing with large amounts of data. This selection is the tip of the iceberg’s tip:Packages and Functions for Exploring Databases Package Function What it does didrooRFM findRFM() Performs a Recency, Frequency, Money analysis on a database of retail transactions vcd assocstats() Calculates statistics for tables of categorical data vcd assoc() Creates a graphic that shows deviations from independence in a table of categorical data tidyverse glimpse() Provides a partial view of a data frame with the columns appearing onscreen as rows plotrix std.
https://cdn.prod.website-files.com/6630d85d73068bc09c7c436c/69195ee32d5c606051d9f433_4.%20All%20For%20You.mp3

Frequently Asked Questions

No items found.