10 Machine Learning Packages to Master

No items found.

Updated

2016-10-06 19:29:15

From the book

Machine Learning For Dummies

Download E-Book

TensorFlow For Dummies

Explore Book

Download E-Book

TensorFlow For Dummies

Explore Book

There are some great machine learning packages such as caret (R) and NumPy (Python). Of course, these are good, versatile packages you can use to begin your machine learning journey. It’s important to have more than a few tools in your toolbox, which is where the suggestions found here come into play.

Cloudera Oryx

Cloudera Oryx is a machine learning project for Apache Hadoop that provides you with a basis for performing machine learning tasks. It emphasizes the use of live data streaming. This product helps you add security, governance, and management functionality that’s missing from Hadoop so that you can create enterprise-level applications with greater ease.

The functionality provided by Oryx builds on Apache Kafka and Apache Spark. Common tasks for this product are real-time spam filters and recommendation engines.

CUDA-Convnet

GPUs enable you to perform machine learning tasks significantly faster. You can add Accelerate to Anaconda to provide basic GPU support for that environment. Caffe is a separate product that you can use to process images using Python or MATLAB.

If you need to perform serious image processing, you obviously need a GPU to do it. The CUDA-Convnet library provides specific support for NVidia’s CUDA GPU processor, which means that it can provide faster processing at the cost of platform flexibility (you must have a CUDA processor in your system). For the most part, this library sees use in neural-network applications.

ConvNetJS

As described for CUDA-Convnet, being able to recognize objects in images is an important machine learning task, but getting the job done without a good library can prove difficult or impossible. While CUDA-Convnet provides support for heavy-duty desktop applications, ConvNetJS provides image-processing support for JavaScript applications. The important feature of this library is that it works asynchronously.

When you make a call, the application continues to work. An asynchronous response lets the application know when tasks, such as training, complete so that the user doesn’t feel as if the browser has frozen (become unresponsive in some way). Given that these tasks can take a long time to complete, the asynchronous call support is essential.

e1071

This R library, e1071, developed by the TU Wien E1071 group on probability theory, provides support for support vector machines (SVMs). Behind its R command interface runs an external C++ library (with a C API to interface with other languages) developed at the National Taiwan University. You can find more on LIBSVM for SVM classification and regression, together with plenty of datasets, tutorials, and even a practical guide for getting more from SVMs.

In addition, you get support functions for latent class analysis, short-time Fourier transform, fuzzy clustering, shortest-path computation, bagged clustering, and Naïve Bayes classifiers.

gbm

The gradient boosting machines (GBM) algorithm uses gradient descent optimization to determine the right weights for learning in the ensemble. The resulting performance increase is impressive, making GBM one of the most powerful predictive tools that you can learn to use in machine learning. The gbm package adds GBM support to R.

This package also includes regression methods for least squares, absolute loss, t-distribution loss, quantile regression, logistic, multinomial logistic, Poisson, Cox proportional hazards partial likelihood, AdaBoost exponential loss, Huberized hinge loss, and Learning to Rank measures (LambdaMart).

The package also provides convenient functions to cross-validate and to find out how to tune without overfitting the number of trees, a crucial hyper-parameter of the algorithm.

Gensim

Gensim is a Python library that can perform natural language processing (NLP) and unsupervised learning on textual data. It offers a wide range of algorithms to choose from: TF-IDF, random projections, latent Dirichlet allocation, latent semantic analysis, and two semantic algorithms: word2vec and document2vec.

Word2vec is based on neural networks (shallow, not deep learning, networks) and it allows meaningful transformations of words into vectors of coordinates that you can operate in a semantic way. For instance, operating on the vector representing Paris, subtracting the vector France, and then adding the vector Italy results in the vector Rome, demonstrating how you can use mathematics and the right Word2vec model to operate semantic operations on text.

glmnet

Regularization is as an effective, fast, and easy solution to use when you have many features and want to reduce the variance of the estimates due to multicollinearity between your predictors. One form of regularization is Lasso, which is one of the forms of support you get from glmnet (with the other being elastic-net). This package fits the linear, logistic and multinomial, Poisson, and Cox regression models.

You can also use glmnet to perform prediction, plotting, and K-fold cross-validation. Professor Rob Tibshirani, the creator of the L1 (also known as Lasso) regularization also helped develop this package. In addition, Gensim provides multiprocessing and out-of-core capabilities, allowing you to speed up the processing of algorithms and handle textual data larger than available RAM memory.

randomForest

You can improve a decision tree by replicating it many times and averaging results to get a more general solution. The R open source package for performing this task is randomForest. You can use it to perform classification and regression tasks based on a forest of trees using random inputs. The Python version of this package appears as RandomForestClassifier and RandomForestRegressor, both of which are found in Scikit-learn.

SciPy

The SciPy stack contains a host of other libraries that you can also download separately. These libraries provide support for mathematics, science, and engineering. When you obtain SciPy, you get a set of libraries designed to work together to create applications of various sorts. These libraries are

NumPy
SciPy
matplotlib
IPython
Sympy
pandas

The SciPy library itself focuses on numerical routines, such as routines for numerical integration and optimization. SciPy is a general-purpose library that provides functionality for multiple problem domains. It also provides support for domain-specific libraries, such as Scikit-learn, Scikit-image, and statsmodels. The site contains many lectures and tutorials on SciPy’s functions.

XGBoost

Other types of gradient boosting machines exist that are based on a slightly different set of optimization approaches and cost functions. The XGBoost package enables you to apply GBM to any problem, thanks to its wide choice of objective functions and evaluation metrics. It operates with a variety of languages, including Python, R, Java, and C++.

In spite of the fact that GBM is a sequential algorithm (and thus slower than others that can take advantage of modern multicore computers), XGBoost leverages multithread processing in order to search in parallel for the best splits among the features. The use of multithreading helps XGBoost turn in an unbeatable performance when compared to other GBM implementations, both in R and Python. Because of all that it contains, the full package name is eXtreme Gradient Boosting (or XGBoost for short).

About This Article

About the book author:

No items found.

This article can be found in the category:

Machine Learning

Hot off the press

Explore Related content

TensorFlow For Dummies

Machine Learning For Dummies

Deep Learning For Dummies

Book & Article Categories

Book & Article Categories

Collections

10 Machine Learning Packages to Master

Cloudera Oryx

CUDA-Convnet

ConvNetJS

e1071

gbm

Gensim

glmnet

randomForest

SciPy

XGBoost

About This Article

About the book author:

This article can be found in the category:

Explore Related content

Book & Article Categories

Book & Article Categories

Collections

10 Machine Learning Packages to Master

Cloudera Oryx

CUDA-Convnet

ConvNetJS

e1071

gbm

Gensim

glmnet

randomForest

SciPy

XGBoost

About This Article

This article is from the book:

About the book author:

This article can be found in the category:

Explore Related content

What Is the gsutil Utility?

Machine Learning: Leveraging Decision Trees with Random Forest Ensembles

The Machine Learning Process

What Is Computer Vision?

How to Use Anaconda for Machine Learning

The Relationship between AI and Machine Learning

10 Applications that Require Deep Learning

Distinguishing Classification Tasks with Convolutional Neural Networks

10 Types of Jobs that Use Deep Learning

Deep Learning and Natural Language Processing

Using AI for Sentiment Analysis

Deep Learning and Recurrent Neural Networks

Machine Learning vs. Deep Learning: Explaining Deep Learning Differences from Other Forms of AI

What is Deep Learning?

Neural Networks and Deep Learning: Neural Network Differentiation

How Does Machine Learning Work?

Deep Learning For Dummies Cheat Sheet

TensorFlow For Dummies Cheat Sheet

How to Create Vector and Matrix Operations in TensorFlow

How to Create Rounding and Comparison TensorFlow Operations