How to Load Data into an SVM Supervised Learning Model - dummies

How to Load Data into an SVM Supervised Learning Model

By Anasse Bari, Mohamed Chaouchi, Tommy Jung

For predictive analytics, you need to load the data for your algorithms to use. Loading the Iris dataset in scikit is as simple as issuing a couple of lines of code because scikit has already created a function to load the dataset.

Sepal Length Sepal Width Petal Length Petal Width Target Class/Label
5.1 3.5 1.4 0.2 Setosa (0)
7.0 3.2 4.7 1.4 Versicolor (1)
6.3 3.3 6.0 2.5 Virginica (2)
  1. Open a new Python interactive shell session.

    Use a new Python session so there isn’t anything left over in memory and you have a clean slate to work with.

  2. Enter the following code in the prompt and observe the output:

    >>> from sklearn.datasets import load_iris
    >>> iris = load_iris()

    After running those two statements, you should not see any messages from the interpreter. The variable iris should contain all the data from the iris.csv file.

Before you create a predictive model, it’s important to understand a little about the new variable iris and what you can do with it. It makes the code easier to follow and the process much simpler to grasp. You can inspect the value of iris by typing it in the interpreter.

>>> iris

The output will be all the content from the iris.csv file, along with some other information about the dataset that the load_iris function loaded into the variable. The variable is a dictionary data structure with four main properties. The important properties of iris are listed below.

Property Name Description
data Contains all the measurements of the observations.
feature_name Contains the name of the feature (attribute name).
target Contains all the targets (labels) of the observations.
target_names Contains the names of the classes.

You can print out the values in the interpreter by typing the variable name followed by dot followed by property name. An example is using to access the property of iris, like this:


This is a standard way of accessing properties of an object in many programming languages.

To create an instance of the SVM classifier, type the following code in the interpreter:

>>> from sklearn.svm import LinearSVC
>>> svmClassifier = LinearSVC(random_state=111)

The first line of code imports the Linear SVC library into the session. The linear Support Vector Classifier (SVC) is an implementation of SVM for linear classification and has multi-class support. The dataset is somewhat linearly separable and has three classes, so it would be a good idea to experiment with Linear SVC to see how it performs.

The second line creates the instance using the variable svmClassifier. This is an important variable to remember. The random_state parameter allows you to reproduce these examples and get the same results. If you didn’t put in the random_state parameter, your results may differ from the ones shown here.