R Project for ML Concepts: Titanic
A dataset that’s often used to illustrate ML concepts in R programming is the information about passengers on the Titanic’s disastrous voyage in 1912. The target variable is whether the passenger survived. You can use this data to create a decision tree.
The data resides in an R package called
titanic. If it’s not already on the Packages tab, click Install. In the Install Packages dialog box, type titanic and click the Install button. After the package downloads, find it on the Packages tab and select its check box.
titanic package, you’ll find
titanic_test. Don’t be tempted to use one as the training set and the other as the test set for this particular application of
titanic_test set doesn’t include the
Survived variable, so it’s not usable for testing a decision tree the way I lay out the process here.
Instead, create the data frame like this:
titanic.df <- titanic_train
Rattle’s Data tab to read in the dataset. This image shows what the Data tab looks like after a few modifications.
What are those modifications? First, a rule of thumb: If a variable is categoric and has a lot of unique values (and if it’s not already classified as an Ident (identifier)), click its Ignore radio button. Also, when first encountering this dataset,
Embarked is the target variable. Use the radio buttons to change
Categoric and to change