Visualizing with Knime and RapidMiner for Machine Learning

By John Paul Mueller, Luca Massaron

Humans have a terrible time visualizing abstract data, and sometimes machine learning output becomes extremely abstract. You can use a graphic output tool so that you can visualize how the data actually appears. Knime and RapidMiner excel at the task by helping you to easily produce high-quality graphics. Their use for various kinds of data mining tasks also distinguishes both of these products from other products.

The pharmaceutical industry relies heavily on Knime to perform both machine learning and data-mining tasks by relying on data flows (pipelines). The use of a GUI makes Knime relatively easy to learn.

In fact, Knime relies on one of the most popular GUIs available today, Eclipse, which is also used to support a large number of programming languages, such as Java, C/C++, JavaScript, and PHP (among many others available through plug-ins). It also integrates well with both Weka and LIBSVM, so ease of use doesn’t come at the loss of functionality.

RapidMiner caters more to the needs of business, which uses it for machine learning, data mining, text mining, predictive analytics, and business analytics needs. In contrast to many other products, RapidMiner relies on a client/server model, in which the server appears as a cloud-based Software-as-a-Service (SAAS) option. This means that a business can test the environment without making a huge initial investment in either software or hardware. RapidMiner works with both R and Python. Companies such as eBay, Intel, PepsiCo, and Kraft Foods currently use RapidMiner for various needs.

A distinguishing characteristic of both these products is that they rely on the Extract, Transform, Load (ETL) model. In this model, the process first extracts all the data needed from various sources, transforms that data into a common format, and then loads the transformed data into a database for analysis. You can find a succinct overview of the process here.