# Scikit-Learn Method Summary

Scikit-learn is a focal point for data science work with Python, so it pays to know which methods you need most. The following table provides a brief overview of the most important methods used for data analysis.

Syntax |
Usage |
Description |

`model_selection.cross_val_score` |
Cross-validation phase | Estimate the cross-validation score |

`model_selection.KFold` |
Cross-validation phase | Divide the dataset into k folds for cross validation |

`model_selection.StratifiedKFold` |
Cross-validation phase | Stratified validation that takes into account the distribution of the classes you predict |

`model_selection.train_test_split` |
Cross-validation phase | Split your data into training and test sets |

`decomposition.PCA` |
Dimensionality reduction | Principal component analysis (PCA) |

`decomposition.RandomizedPCA` |
Dimensionality reduction | Principal component analysis (PCA) using randomized SVD |

`feature_extraction.FeatureHasher` |
Preparing your data | The hashing trick, allowing you to accommodate a large number of features in your dataset |

`feature_extraction.text.CountVectorizer` |
Preparing your data | Convert text documents into a matrix of count data |

`feature_extraction.text.HashingVectorizer` |
Preparing your data | Directly convert your text using the hashing trick |

`feature_extraction.text.TfidfVectorizer` |
Preparing your data | Creates a dataset of TF-IDF features |

`feature_selection.RFECV` |
Feature selection | Automatic feature selection |

`model_selection.GridSearchCV` |
Optimization | Exhaustive search in order to maximize a machine learning algorithm |

`linear_model.LinearRegression` |
Prediction | Linear regression |

`linear_model.LogisticRegression` |
Prediction | Linear logistic regression |

`metrics.accuracy_score` |
Solution evaluation | Accuracy classification score |

`metrics.f1_score` |
Solution evaluation | Compute the F1 score, balancing accuracy and recall |

`metrics.mean_absolute_error` |
Solution evaluation | Mean absolute error regression error |

`metrics.mean_squared_error` |
Solution evaluation | Mean squared error regression error |

`metrics.roc_auc_score` |
Solution evaluation | Compute Area Under the Curve (AUC) from prediction scores |

`naive_bayes.MultinomialNB` |
Prediction | Multinomial Naïve Bayes |

`neighbors.KNeighborsClassifier` |
Prediction | K-Neighbors classification |

`preprocessing.Binarizer` |
Preparing your data | Create binary variables (feature values to 0 or 1) |

`preprocessing.Imputer` |
Preparing your data | Missing values imputation |

`preprocessing.MinMaxScaler` |
Preparing your data | Create variables bound by a minimum and maximum value |

`preprocessing.OneHotEncoder` |
Preparing your data | Transform categorical integer features into binary ones |

`preprocessing.StandardScaler` |
Preparing your data | Variable standardization by removing the mean and scaling to unit variance |