Looking at the Rattle Log for R Programming - dummies

Looking at the Rattle Log for R Programming

By Joseph Schmuller

The Log tab shows your interactions with Rattle as R code. Here’s a good example of working with the Rattle log.

In the hierarchical clustering analysis, click on Data Plot. You see a plot that looks very much like this.

Data plot
Data plot.

To find the code that produced this plot, select the Log tab and scroll down until you find this:

plot(crs$dataset[, c(1:4)], col=cutree(crs$hclust,3))

Copy and paste that line into the RStudio Script panel and then press Ctrl+R to run it.

On the Plots tab, you see the same scatterplot matrix, but without the title. The plotting characters aren’t filled, and their border colors (black, red, and green) are the colors of the clusters to which Rattle has assigned them.

To make the matrix look more like the image above, change cr$dataset[, c(1:4)] to cr$dataset[, c(1:5)]. This change adds the fifth row and the fifth column.

Add the argument lower.panel=NULL to eliminate everything below the main diagonal. Then add plot character arguments so that the code is

plot(crs$dataset[, c(1:5)], col=cutree(crs$hclust, 3), lower.panel=NULL, pch=21,cex=2,
         bg = c("black","grey","white")[iris.uci$species])

Now the border color of each character corresponds to its assigned cluster, and its fill color corresponds to its species. If you run this code, you see that in the scatterplots, some of the plot characters have red borders and are filled with gray and some red-border characters are filled with white. In the fifth column, all points in the rightmost group should have green borders, but some have red borders. What does all this tell us? That the clustering isn’t perfect! That is, the three clusters do not correspond exactly with the three species.

Poking around in the Rattle log was a pretty good idea!

The Rattle Evaluation tab has procedures for evaluating your ML creations.