Choosing the Right Algorithm for Machine Learning - dummies

Choosing the Right Algorithm for Machine Learning

By John Paul Mueller, Luca Massaron

Part of Machine Learning For Dummies Cheat Sheet

Machine learning involves the use of many different algorithms. This table gives you a quick summary of the strengths and weaknesses of various algorithms.

Algorithm Best at Pros Cons
Random Forest Apt at almost any machine learning problem

Bioinformatics

Can work in parallel

Seldom overfits

Automatically handles missing values

No need to transform any variable

No need to tweak parameters

Can be used by almost anyone with excellent results

Difficult to interpret

Weaker on regression when estimating values at the extremities of the distribution of response values

Biased in multiclass problems toward more frequent classes

Gradient Boosting Apt at almost any machine learning problem

Search engines (solving the problem of learning to rank)

It can approximate most nonlinear function

Best in class predictor

Automatically handles missing values

No need to transform any variable

It can overfit if run for too many iterations

Sensitive to noisy data and outliers

Doesn’t work well without parameter tuning

Linear regression Baseline predictions

Econometric predictions

Modelling marketing responses

Simple to understand and explain

It seldom overfits

Using L1 & L2 regularization is effective in feature selection

Fast to train

Easy to train on big data thanks to its stochastic version

You have to work hard to make it fit nonlinear functions

Can suffer from outliers

Support Vector Machines Character recognition

Image recognition

Text classification

Automatic nonlinear feature creation

Can approximate complex nonlinear functions

Difficult to interpret when applying nonlinear kernels

Suffers from too many examples, after 10,000 examples it starts taking too long to train

K-nearest Neighbors Computer vision

Multilabel tagging

Recommender systems

Spell checking problems

Fast, lazy training

Can naturally handle extreme multiclass problems (like tagging text)

Slow and cumbersome in the predicting phase

Can fail to predict correctly due to the curse of dimensionality

Adaboost Face detection Automatically handles missing values

No need to transform any variable

It doesn’t overfit easily

Few parameters to tweak

It can leverage many different weak-learners

Sensitive to noisy data and outliers

Never the best in class predictions

Naive Bayes Face recognition

Sentiment analysis

Spam detection

Text classification

Easy and fast to implement, doesn’t require too much memory and can be used for online learning

Easy to understand

Takes into account prior knowledge

Strong and unrealistic feature independence assumptions

Fails estimating rare occurrences

Suffers from irrelevant features

Neural Networks Image recognition

Language recognition and translation

Speech recognition

Vision recognition

Can approximate any nonlinear function

Robust to outliers

Works only with a portion of the examples (the support vectors)

Very difficult to set up

Difficult to tune because of too many parameters and you have also to decide the architecture of the network

Difficult to interpret

Easy to overfit

Logistic regression Ordering results by probability

Modelling marketing responses

Simple to understand and explain

It seldom overfits

Using L1 & L2 regularization is effective in feature selection

The best algorithm for predicting probabilities of an event

Fast to train

Easy to train on big data thanks to its stochastic version

You have to work hard to make it fit nonlinear functions

Can suffer from outliers

SVD Recommender systems Can restructure data in a meaningful way Difficult to understand why data has been restructured in a certain way
PCA Removing collinearity

Reducing dimensions of the dataset

Can reduce data dimensionality Implies strong linear assumptions (components are a weighted summations of features)
K-means Segmentation Fast in finding clusters

Can detect outliers in multiple dimensions

Suffers from multicollinearity

Clusters are spherical, can’t detect groups of other shape

Unstable solutions, depends on initialization