Data Science: How to Set Up a Support Vector Machine Predictive Model in Python
Data scientists deem Support Vector Machines (SVM) to be one of the most complex and powerful machine-learning techniques in their toolbox, so you usually find this topic solely in advanced manuals. However, you shouldn’t turn away from this great learning algorithm because the Scikit-learn library offers you a wide and accessible range of SVM-supervised classes for regression and classification.
Although SVM is complex, it’s a great tool. After you find the most suitable SVM version for your problem, you have to apply it to your data and work a little to optimize some of the many parameters available and improve your results. Setting up a working SVM predictive model involves these general steps:
Choose the SVM class you’ll use.
Train your model with the data.
Check your validation error and make it your baseline.
Try different values for the SVM parameters.
Check whether your validation error improves.
Train your model again using the data with the best parameters.
As far as choosing the right SVM class goes, you have to think about your problem. For example, you could choose a classification (guess a class) or regression (guess a number). When working with a classification, you must consider whether you need to classify just two groups (binary classification) or more than two (multiclass classification).
Another important aspect to consider is the quantity of data you have to process. After taking notes of all your requirements on a list, a quick glance below will help you to narrow your choices.
|Class||Characteristic usage||Key parameters|
|sklearn.svm.SVC||Binary and multiclass classification when the number of
examples is less than 10,000
|C, kernel, degree, gamma|
|sklearn.svm.NuSVC||Similar to SVC||nu, kernel, degree, gamma|
|sklearn.svm.LinearSVC||Binary and multiclass classification when the number of
examples is more than 10,000; sparse data
|Penalty, loss, C|
|sklearn.svm.SVR||Regression problems||C, kernel, degree, gamma, epsilon|
|sklearn.svm.NuSVR||Similar to SVR||Nu, C, kernel, degree, gamma|
|sklearn.svm.OneClassSVM||Outliers detection||nu, kernel, degree, gamma|
The first step is to check the number of examples in your data. When you have more than 10,000 examples, in order to avoid too slow and cumbersome computations, you can use SVM and still get an acceptable performance only for classification problems by using sklearn.svm.LinearSVC. If instead you need to solve a regression problem or the LinearSVC isn’t fast enough, you need to use a stochastic solution for SVM.
The Scikit-learn SVM module wraps two powerful libraries written in C, libsvm and liblinear. When fitting a model, there is a flow of data between Python and the two external libraries. A cache smooths the data exchange operations. However, if the cache is too small and you have too many data points, the cache becomes a bottleneck!
If you have enough memory, it’s a good idea to set a cache size greater than the default 200MB (1000MB, if possible) using the SVM class’ cache_size parameter. Smaller numbers of examples require only that you decide between classification and regression.
In each case, you’ll have two alternative algorithms. For example, for classification, you may use sklearn.svm.SVC or sklearn.svm.NuSVC. The only difference with the Nu version is the parameters it takes and the use of a slightly different algorithm. In the end, it gets basically the same results, so you normally choose the non-Nu version.
After deciding on which algorithm to use, you find out that you have a bunch of parameters to choose, and the C parameter is always among them. The C parameter indicates how much the algorithm has to adapt to training points.
When C is small, the SVM adapts less to the points and tends to take an average direction, just using a few of the points and variables available. Larger C values tend to force the learning process to follow more of the available training points and to get involved with many variables.
The right C is usually a middle value, and you can find it after a bit of experimentation. If your C is too large, you risk overfitting, a situation in which your SVM adapts too much to your data and cannot properly handle new problems. If your C is too small, your prediction will be rougher and imprecise. You’ll experience a situation called underfitting — your model is too simple for the problem you want to solve.
After deciding the C value to use, the important block of parameters to fix is kernel, degree, and gamma. All three interconnect and their value depends on the kernel specification (for instance, the linear kernel doesn’t require degree or gamma, so you can use any value). The kernel specification determines whether your SVM model uses a line or a curve in order to guess the class or the point measure.
Linear models are simpler and tend to guess well on new data, but sometimes underperform when variables in the data relate to each other in complex ways. Because you can’t know in advance whether a linear model works for your problem, it’s good practice to start with a linear kernel, fix its C value, and use that model and its performance as a baseline for testing nonlinear solutions afterward.