Choosing a Python Distribution with Machine Learning in Mind - dummies

Choosing a Python Distribution with Machine Learning in Mind

By John Paul Mueller, Luca Massaron

It’s entirely possible to obtain a generic copy of Python and add all the required machine learning libraries to it. The process can be difficult because you need to ensure that you have all the required libraries in the correct versions to guarantee success. In addition, you need to perform the configuration required to make sure that the libraries are accessible when you need them.

Fortunately, going through the required work is not necessary because a number of Python machine learning products are available for you to use. These products provide everything needed to get started with machine learning projects.

Windows 10 presents some serious installation issues when working with Python. You can read about these issues on John Mueller’s blog. Given that so many readers have sent feedback saying that Windows 10 doesn’t provide a good environment, it’s hard to recommend Windows 10 as a Python platform. If you’re working with Windows 10, simply be aware that your road to a Python installation will be a rocky one.

Getting Continuum Analytics Anaconda

The basic Anaconda package is a free download. Simply click Download Anaconda to obtain access to the free product. You do need to provide an email address to get a copy of Anaconda. After you provide your email address, you go to another page, where you can choose your platform and the installer for that platform. Anaconda supports the following platforms:

  • Windows 32-bit and 64-bit (the installer may offer you only the 64-bit or 32-bit version, depending on which version of Windows it detects)
  • Linux 32-bit and 64-bit
  • Mac OS X 64-bit

The default download version installs Python 2.7. You can also choose to install Python 3.5 by clicking one of the links in the Python 3.5 part of the page. Both Windows and Mac OS X provide graphical installers. When using Linux, you rely on the bash utility.

You can obtain Anaconda with older versions of Python. If you want to use an older version of Python, click the installer archive link near the bottom of the page. You should use an older version of Python only when you have a pressing need to do so.

The Miniconda installer can potentially save time by limiting the number of features you install. However, trying to figure out precisely which packages you do need is an error-prone and time-consuming process. In general, you want to perform a full installation to ensure that you have everything needed for your projects. Even a full install doesn’t require much time or effort to download and install on most systems.

When you look on the site, you see that many other add-on products are available. These products can help you create robust applications. For example, when you add Accelerate to the mix, you obtain the capability to perform multicore and GPU-enabled operations. The Anaconda site provides details on using them.

Getting Enthought Canopy Express

Enthought Canopy Express is a free product for producing both technical and scientific applications using Python. Click Download Free on the main page to see a listing of the versions that you can download. Only Canopy Express is free; the full Canopy product comes at a cost. Canopy Express supports the following platforms:

  • Windows 32-bit and 64-bit
  • Linux 32-bit and 64-bit
  • Mac OS X 32-bit and 64-bit

Choose the platform and version you want to download. When you click Download Canopy Express, you see an optional form for providing information about yourself. The download starts automatically, even if you don’t provide personal information to the company.

One of the advantages of Canopy Express is that Enthought is heavily involved in providing support for both students and teachers. People also can take classes, including online classes, that teach the use of Canopy Express in various ways.

Also offered is live classroom training designed specifically for the data scientist. Taking the data science classes won’t teach you the details of working through machine learning problems, but they help you understand how to work with big data, which is a part of working through machine learning problems. In short, knowing data science gives you a boost in using Python for machine learning, but it doesn’t completely remove the learning curve.

Getting pythonxy

The pythonxy Integrated Development Environment (IDE) is a community project hosted on Google. It’s a Windows-only product, so you can’t easily use it for cross-platform needs. (In fact, it supports only Windows Vista, Windows 7, and Windows 8.) However, it does come with a full set of libraries.

Because pythonxy uses the GNU General Public License (GPL) v3, you have no add-ons, training, or other paid features to worry about. No one will come calling at your door hoping to sell you something. In addition, you have access to all the source code for pythonxy, so you can make modifications if you want.

Getting WinPython

The name tells you that WinPython is a Windows-only product. This product is actually a spin-off of pythonxy and isn’t meant to replace it. Quite the contrary: WinPython is simply a more flexible way to work with pythonxy. Read more about the motivation for creating WinPython.

The bottom line for this product is that you gain flexibility at the cost of friendliness and a little platform integration. However, for developers who need to maintain multiple versions of an IDE, WinPython may make a significant difference.