# Choosing the Right Algorithm for Machine Learning

Machine learning involves the use of many different algorithms. This table gives you a quick summary of the strengths and weaknesses of various algorithms.

Algorithm |
Best at |
Pros |
Cons |

Random Forest | Apt at almost any machine learning problem
Bioinformatics |
Can work in parallel
Seldom overfits Automatically handles missing values No need to transform any variable No need to tweak parameters Can be used by almost anyone with excellent results |
Difficult to interpret
Weaker on regression when estimating values at the extremities of the distribution of response values Biased in multiclass problems toward more frequent classes |

Gradient Boosting | Apt at almost any machine learning problem
Search engines (solving the problem of learning to rank) |
It can approximate most nonlinear function
Best in class predictor Automatically handles missing values No need to transform any variable |
It can overfit if run for too many iterations
Sensitive to noisy data and outliers Doesn’t work well without parameter tuning |

Linear regression | Baseline predictions
Econometric predictions Modelling marketing responses |
Simple to understand and explain
It seldom overfits Using L1 & L2 regularization is effective in feature selection Fast to train Easy to train on big data thanks to its stochastic version |
You have to work hard to make it fit nonlinear functions
Can suffer from outliers |

Support Vector Machines | Character recognition
Image recognition Text classification |
Automatic nonlinear feature creation
Can approximate complex nonlinear functions |
Difficult to interpret when applying nonlinear kernels
Suffers from too many examples, after 10,000 examples it starts taking too long to train |

K-nearest Neighbors | Computer vision
Multilabel tagging Recommender systems Spell checking problems |
Fast, lazy training
Can naturally handle extreme multiclass problems (like tagging text) |
Slow and cumbersome in the predicting phase
Can fail to predict correctly due to the curse of dimensionality |

Adaboost | Face detection | Automatically handles missing values
No need to transform any variable It doesn’t overfit easily Few parameters to tweak It can leverage many different weak-learners |
Sensitive to noisy data and outliers
Never the best in class predictions |

Naive Bayes | Face recognition
Sentiment analysis Spam detection Text classification |
Easy and fast to implement, doesn’t require too much memory and can be used for online learning
Easy to understand Takes into account prior knowledge |
Strong and unrealistic feature independence assumptions
Fails estimating rare occurrences Suffers from irrelevant features |

Neural Networks | Image recognition
Language recognition and translation Speech recognition Vision recognition |
Can approximate any nonlinear function
Robust to outliers Works only with a portion of the examples (the support vectors) |
Very difficult to set up
Difficult to tune because of too many parameters and you have also to decide the architecture of the network Difficult to interpret Easy to overfit |

Logistic regression | Ordering results by probability
Modelling marketing responses |
Simple to understand and explain
It seldom overfits Using L1 & L2 regularization is effective in feature selection The best algorithm for predicting probabilities of an event Fast to train Easy to train on big data thanks to its stochastic version |
You have to work hard to make it fit nonlinear functions
Can suffer from outliers |

SVD | Recommender systems | Can restructure data in a meaningful way | Difficult to understand why data has been restructured in a certain way |

PCA | Removing collinearity
Reducing dimensions of the dataset |
Can reduce data dimensionality | Implies strong linear assumptions (components are a weighted summations of features) |

K-means | Segmentation | Fast in finding clusters
Can detect outliers in multiple dimensions |
Suffers from multicollinearity
Clusters are spherical, can’t detect groups of other shape Unstable solutions, depends on initialization |