Basics of R Programming for Predictive Analytics - dummies

Basics of R Programming for Predictive Analytics

By Anasse Bari, Mohamed Chaouchi, Tommy Jung

R is a programming language originally written for statisticians to do statistical analysis, including predictive analytics. It’s open-source software, used extensively in academia to teach such disciplines as statistics, bio-informatics, and economics. From its humble beginnings, it has since been extended to do data modeling, data mining, and predictive analysis.

R has a very active community; free code contributions are being made constantly and consistently. One of the benefits of using an open-source tool such as R is that most of the data analysis that you’ll want to do has already been done by someone. Code samples are posted on many message boards and by universities.

If you’re stuck with some problematic code, simply post a question on a message board (such as stack-exchange or stack-overflow) and you’ll have an answer in no time.

Because R is free to use, it’s the perfect tool to use to build a rapid prototype to show management the benefits of predictive analytics. You don’t have to ask management to buy anything in order to get started right away. Any one of your data scientists, business analysts, statisticians, or software engineers can do the prototype without any further investment in software.

Therefore R can be an inexpensive way to experiment with predictive analytics without having to purchase enterprise software. After you prove that predictive analytics can add (or is adding) value, you should be able to convince management to consider getting a commercial-grade tool for your newly minted data-science team.

How to install R

Installing R is an easy process that takes less than thirty minutes. Most of the default settings can be accepted during the installation process. You can install R by downloading the installation program for Windows and other operating systems from the R website.

After you get to the R website, you can look for downloads link to get the file. After you’ve downloaded the file, just double-click it to begin the installation process.

How to install RStudio

After you’ve finished the R installation process, you may install RStudio. Installing the RStudio IDE is just as easy as installing R. You can download RStudio Desktop from their website. You’ll want to install the desktop version appropriate for your operating system (for example, RStudio version 0.97.551 for Windows). After you’ve downloaded the file, just double-click it to begin the installation process.

Here is a direct link for your convenience.

Basics of the R environment

RStudio is a graphical user interface for developing R programs. The default interface (the way it looks when you first start the program) has four window panes. You’ll use all four of them frequently.


  • The top-left window is your script window.

    This is where you can copy and paste R code. You can run the code line-by-line or in chunks by highlighting the lines you want to execute. The script window is also where you can view the values of data frames. When you click a data frame from the workspace pane, it will open a new tab in the script pane with the data frame values.

  • The bottom-left window is your console window.

    This is where you type your R code one line at a time. The output (if there is any) is printed on the next line right after the command finishes execution.

  • The top-right window is your workspace and history window.

    It has two tabs:

    • The History tab stores the history of all the code you’ve executed in the current session.

    • The Workspace tab lists all the variables in the memory . Here you can click the variables to see their values and (if you so choose) load datasets interactively.

  • The bottom-right window is where you’ll find four tabs of interest:

    • A Help tab offers documentation such as descriptions of functions.

    • The Packages tab shows all the packages installed and available to load by your program. The checked packages are the ones that have been loaded for your program to use. You can search and install new packages here.

    • The Plots tab is where the output of any plots will appear.

    • The Files tab is your file explorer inside RStudio.