Choosing an R Distribution with Machine Learning in Mind - dummies

Choosing an R Distribution with Machine Learning in Mind

By John Paul Mueller, Luca Massaron

You need to keep your machine learning goals in mind when choosing an R distribution. R is a combination of an environment and a language. It’s a form of the S programming language, which John Chambers originally created at Bell Laboratories to make working with statistics easier. Rick Becker and Allan Wilks eventually added to the S programming language as well. The goal of the R language is to turn ideas into software quickly and easily.

In other words, R is a language designed to help someone who doesn’t have much programming experience create code without a huge learning curve.

R is a freely downloadable product that can run most S code without modification; in contrast, you have to pay for S. R is a great choice. You can read more about R in general.

You don’t want to make sweeping generalizations about the languages used for machine learning. Both R and Python are popular languages for different reasons. Articles such as “In data science, the R language is swallowing Python” initially seem to say that R is becoming more popular for some reason. The author wisely backs away from this statement by pointing out that R is best used for statistical purposes and Python is a better general-purpose language.

The best developers always have an assortment of programming tools in their tool belts to make performing tasks easier. Languages address developer needs, so you need to use the right language for the job. After all, all languages ultimately become machine code that a processor understands — a language that few developers understand any longer because high-level programming languages make development easier.

You can get a basic copy of R from the Comprehensive R Archive Network (CRAN) site. The site provides both source code versions and compiled versions of the R distribution for various platforms. Unless you plan to make your own changes to the basic R support or want to delve into how R works, getting the compiled version is always better. If you use RStudio, as suggested in the next paragraph, you must also download and install a copy of R.

You can use the Desktop version of RStudio to make the task of working with R even easier. This product is a free download, and you can get it in Linux (Debian/Ubuntu, RedHat/CentOS, and SUSE Linux), Mac, and Windows versions.

You can try other R distributions if you find that you don’t like RStudio. The most common alternative distributions are StatET, Red-R (also available at Decisionstats.com), and Rattle. All of them are good products, but RStudio appears to have the strongest following and is the simplest product to use. You can read discussions about the various choices.