Machine Learning in Academia with Weka

By John Paul Mueller, Luca Massaron

Weka (also available at Sourceforge.net) is a collection of machine learning algorithms written in Java and developed at the University of Waikato, New Zealand. The main purpose of Weka is to perform data-mining tasks, and initially, schools used it as a learning tool. Now the tool is included as part of the Pentaho business intelligence suite where Weka is used for business intelligence. You can use it for

  • Association rules
  • Attribute selection
  • Clustering
  • Data preprocessing
  • Data classification
  • Data visualization
  • Regression analysis
  • Workflow analysis

The reasons that Weka works especially well in schools is that the Java code runs on nearly any platform and you can download Weka free. You can apply Weka algorithms directly to a dataset or use Weka within your own Java code, making the environment extremely flexible. The one downside of Weka is that it tends not to work well on really large datasets.

To use Weka, you must also install an appropriate version of Java on your system. You can use Weka with any DBMS that Java or a third-party Java add-on product supports through Java Database Connectivity (JDBC), so you have a wide selection of data sources from which to choose.