How to Explain the Predictive Analytical Results of R Regression
Once you create an R regression model for predictive analytics, you want to be able to explain the results of the analysis. To see some useful information about the model, type in the following code:
> summary(model)
The output provides information that you can explore if you want to tweak your model further. For now, we’ll leave the model as it is. Here are the last two lines of the output:
Multiple Rsquared: 0.8741, Adjusted Rsquared: 0.8633 Fstatistic: 80.82 on 22 and 256 DF, pvalue: < 2.2e16
A couple of data points stand out here:

The Multiple Rsquared value tells you how well the regression line fits the data (goodness of fit). A value of 1 means that it’s a perfect fit. So an rsquared value of 0.874 is good; it says that 87.4 percent of the variability in mpg is explained by the model.

The pvalue tells you how significant the predictor variables affect the response variable. A pvalue of less than (typically) 0.05 means that you can reject the null hypothesis that the predictor variables collectively have no effect on the response variable (mpg). The pvalue of 2.2e16 (that is, 2.2 with 16 zeroes in front of it) is much smaller than 0.05, so the predictors have an effect on the response.
With the model created, you can make predictions against it with the test data you partitioned from the full dataset. To use this model to predict the for each row in the test set, you issue the following command:
> predictions < predict(model, testSet,
interval="predict", level=.95)
This is the code and output of the first six predictions:
> head(predictions) fit lwr upr 2 16.48993 10.530223 22.44964 4 18.16543 12.204615 24.12625 5 18.39992 12.402524 24.39732 6 12.09295 6.023341 18.16257 7 11.37966 5.186428 17.57289 8 11.66368 5.527497 17.79985
The output is a matrix that shows the predicted values in the fit column and the prediction interval in the lwr and upr columns — with a confidence level of 95 percent. The higher the confidence level, the wider the range, and vice versa.
The predicted value is in the middle of the range; so changing the confidence level doesn’t change the predicted value. The first column is the row number of the full dataset.
To see the actual and predicted values side by side so you can easily compare them, you can type in the following lines of code:
> comparison < cbind(testSet$mpg, predictions[,1]) > colnames(comparison) < c("actual", "predicted")
The first line creates a twocolumn matrix with the actual and predicted values. The second line changes the column names to actual and predicted. Type in the first line of code to get the output of the first six lines of comparison, as follows:
> head(comparison) actual predicted 2 15 16.48993 4 16 18.16543 5 17 18.39992 6 15 12.09295 7 14 11.37966 8 14 11.66368
We also want to see a summary of the two columns to compare their means. This is the code and output of the summary:
> summary(comparison) actual predicted Min. :10.00 Min. : 8.849 1st Qu.:16.00 1st Qu.:17.070 Median :21.50 Median :22.912 Mean :22.79 Mean :23.048 3rd Qu.:28.00 3rd Qu.:29.519 Max. :44.30 Max. :37.643
Next you use the mean absolute percent error (mape), to measure the accuracy of our regression model. The formula for mean absolute percent error is
(Σ(YY’/Y)/N)*100
where Y is the actual score ,Y’ is the predicted score, and N is the number of predicted scores. After plugging the values into the formula, you get an error of only 10.94 percent. Here is the code and the output from the R console:
> mape < (sum(abs(comparison[,1]comparison[,2]) / abs(comparison[,1]))/nrow(comparison))*100 > mape [1] 10.93689
The following code enables you to view the results and errors in a table view:
> mapeTable < cbind(comparison, abs(comparison[,1] comparison[,2])/comparison[,1]*100) > colnames(mapeTable)[3] < "absolute percent error" > head(mapeTable) actual predicted absolute percent error 2 15 16.48993 9.932889 4 16 18.16543 13.533952 5 17 18.39992 8.234840 6 15 12.09295 19.380309 7 14 11.37966 18.716708 8 14 11.66368 16.688031
Here’s the code that enables you to see the percent error again:
> sum(mapeTable[,3])/nrow(comparison)
[1] 10.93689