Algorithm | Best at | Pros | Cons |
Random Forest | Apt at almost any machine learning problem Bioinformatics |
Can work in parallel Seldom overfits Automatically handles missing values No need to transform any variable No need to tweak parameters Can be used by almost anyone with excellent results |
Difficult to interpret Weaker on regression when estimating values at the extremities of the distribution of response values Biased in multiclass problems toward more frequent classes |
Gradient Boosting | Apt at almost any machine learning problem Search engines (solving the problem of learning to rank) |
It can approximate most nonlinear function Best in class predictor Automatically handles missing values No need to transform any variable |
It can overfit if run for too many iterations Sensitive to noisy data and outliers Doesn’t work well without parameter tuning |
Linear regression | Baseline predictions Econometric predictions Modelling marketing responses |
Simple to understand and explain It seldom overfits Using L1 & L2 regularization is effective in feature selection Fast to train Easy to train on big data thanks to its stochastic version |
You have to work hard to make it fit nonlinear functions Can suffer from outliers |
Support Vector Machines | Character recognition Image recognition Text classification |
Automatic nonlinear feature creation Can approximate complex nonlinear functions |
Difficult to interpret when applying nonlinear kernels Suffers from too many examples, after 10,000 examples it starts taking too long to train |
K-nearest Neighbors | Computer vision Multilabel tagging Recommender systems Spell checking problems |
Fast, lazy training Can naturally handle extreme multiclass problems (like tagging text) |
Slow and cumbersome in the predicting phase Can fail to predict correctly due to the curse of dimensionality |
Adaboost | Face detection | Automatically handles missing values No need to transform any variable It doesn’t overfit easily Few parameters to tweak It can leverage many different weak-learners |
Sensitive to noisy data and outliers Never the best in class predictions |
Naive Bayes | Face recognition Sentiment analysis Spam detection Text classification |
Easy and fast to implement, doesn’t require too much memory and can be used for online learning Easy to understand Takes into account prior knowledge |
Strong and unrealistic feature independence assumptions Fails estimating rare occurrences Suffers from irrelevant features |
Neural Networks | Image recognition Language recognition and translation Speech recognition Vision recognition |
Can approximate any nonlinear function Robust to outliers Works only with a portion of the examples (the support vectors) |
Very difficult to set up Difficult to tune because of too many parameters and you have also to decide the architecture of the network Difficult to interpret Easy to overfit |
Logistic regression | Ordering results by probability Modelling marketing responses |
Simple to understand and explain It seldom overfits Using L1 & L2 regularization is effective in feature selection The best algorithm for predicting probabilities of an event Fast to train Easy to train on big data thanks to its stochastic version |
You have to work hard to make it fit nonlinear functions Can suffer from outliers |
SVD | Recommender systems | Can restructure data in a meaningful way | Difficult to understand why data has been restructured in a certain way |
PCA | Removing collinearity Reducing dimensions of the dataset |
Can reduce data dimensionality | Implies strong linear assumptions (components are a weighted summations of features) |
K-means | Segmentation | Fast in finding clusters Can detect outliers in multiple dimensions |
Suffers from multicollinearity Clusters are spherical, can’t detect groups of other shape Unstable solutions, depends on initialization |