The Basics of Deep Learning Framework Usage and Low-End Framework Options

Data Science Essentials For Dummies

A deep learning framework is an abstraction that provides generic functionality, which your application code modifies to serve its own purposes. Unlike a library that runs within your application, when you’re using a framework, your application runs within it.

You can’t modify basic deep learning framework functionality, which means that you have a stable environment in which to work, but most frameworks offer some level of extensibility. Deep learning frameworks are generally specific to a particular need, such as the web frameworks used to create online applications.

When thinking about a deep learning framework, what you’re really considering is how the framework manages the frozen spots and the hot spots used by the application. In most cases, a deep learning framework provides frozen spots and hot spots in these areas:

Hardware access (such as using a GPU with ease)
Standard neural network layer access
Deep learning primitive access
Computational graph management
Model training
Model deployment
Model testing
Graph building and presentation
Inference (forward propagation)
Automatic differentiation (backpropagation)

A good deep learning framework also exhibits specific characteristics that you may not find in other framework types. These characteristics help create an environment in which the deep learning framework enables you to create intelligent applications that learn and process data quickly. Here are some of the characteristics to consider when looking at a deep learning framework:

Optimizes for performance rather than resource usage or some other consideration
Performs tasks using parallel operations to reduce the time spent creating a model and associated neural network
Computes gradients automatically
Makes coding easy because many of the people using deep learning frameworks aren’t developers, but rather subject matter experts

Interacts well with standard libraries used for plotting, machine learning, and statistics

Deep learning frameworks address other issues, such as providing good community support for specific problem domains, and the focus on specific issues determines the viability of a particular framework for a particular purpose. As with many forms of software development aid, you need to choose the deep learning framework you use carefully.

Working with low-end deep learning frameworks

Low-end deep learning frameworks often come with a built-in trade-off. You must choose between cost and usage complexity, as well as the need to support large applications in challenging environments.

The trade-offs you’re willing to endure will generally reflect what you can use to complete your project. With this caveat in mind, the following information discusses a number of low-end frameworks that are incredibly useful and work well with small to medium-size projects, but that come with trade-offs for you to consider as well.

Chainer

Chainer is a library written purely in Python that relies on the NumPy and CuPy libraries. Preferred Networks leads the development of this library, but IBM, Intel, Microsoft, and NVidia also play a role. The main point with this library is that helps you use the CUDA capabilities of your GPU by adding only a few lines of code. In other words, this library gives you a simple way to greatly enhance the speed of your code when working with huge datasets.

Many deep learning libraries today, such as Theano and TensorFlow, use a static deep learning approach called define and run, in which you define the math operations and then perform training based on those operations.

Unlike Theano and TensorFlow, Chainer uses a define-by-run approach, which relies on a dynamic deep learning approach in which the code defines math operations as the training occurs. Here are the two main advantages to this approach:

Intuitive and flexible approach: A define-by-run approach can rely on a language’s native capabilities rather than require you to create special operations to perform analysis.
Debugging: Because the define-by-run approach defines the operations during training, you can rely on the internal debugging features to locate the source of errors in a dataset or the application code.

TensorFlow 2.0 can also use define-by-run by relying on Chainer to provide eager execution.

PyTorch

PyTorch is the successor to Torch written in the Lua language. A core one of the Torch libraries (the PyTorch autograd library) started as a fork of Chainer. Facebook initially developed PyTorch, but many other organizations use it today, including Twitter, Salesforce, and the University of Oxford. Here are the features that make PyTorch special:

Extremely user friendly
Efficient memory usage
Relatively fast
Commonly used for research

Some people like PyTorch because it’s easy to read like Keras, but the scientist doesn’t lose the ability to use complicated neural networks. In addition, PyTorch supports dynamic computational model graphing directly, which makes it more flexible than TensorFlow without the addition of TensorFlow Fold.

MXNet

The biggest reason to use MXNet is speed. It might be hard to figure out whether MXNet or CNTK is faster, but both products are quite fast and are often used as a contrast to the slowness that some people experience when working with TensorFlow. (This white paper provides some details on benchmarking of deep learning code.)

MXNet is an Apache product that supports a host of languages including Python, Julia, C++, R, and JavaScript. Numerous large organizations use it, including Microsoft, Intel, and Amazon web Services. Here are the aspects that make MXNet special:

Features advanced GPU support
Can be run on any device
Provides a high-performance imperative API
Offers easy model serving
Provides high scalability

It may sound like the perfect product for your needs, but MXNet does come with at least one serious failing: It lacks the level of community support that TensorFlow offers. In addition, most researchers don’t look at MXNet favorably because it can become complex, and a researcher isn’t dealing with a stable model in most cases.

Microsoft Cognitive Toolkit/CNTK

Its speed is one of the reasons to use the Microsoft Cognitive Toolkit (CNTK). Microsoft uses CNTK for big datasets — really big ones. As a product, it supports the Python, C++, C#, and Java programming languages.

Consequently, if you’re a researcher who relies on R, this isn’t the product for you. Microsoft has used this product in Skype, Xbox and Cortana. This product’s special features are

Great performance
High scalability
Highly optimized components
Apache Spark support
Azure Cloud support

As with MXNet, CNTK has a distinct problem in its lack of adequate community support. In addition, it tends not to provide much in the way of third-party support, either, so if the package doesn’t contain the features you need, you might not get them at all.

Fully evaluate your needs before selecting your deep learning framework.

About This Article

About the book author:

John Paul Mueller is a freelance author and technical editor. He has writing in his blood, having produced 100 books and more than 600 articles to date. The topics range from networking to home security and from database management to heads-down programming. John has provided technical services to both Data Based Advisor and Coast Compute magazines.

Luca Massaron is a data scientist specialized in organizing and interpreting big data and transforming it into smart data by means of the simplest and most effective data mining and machine learning techniques. Because of his job as a quantitative marketing consultant and marketing researcher, he has been involved in quantitative data since 2000 with different clients and in various industries, and is one of the top 10 Kaggle data scientists.