Apache Bigtop and Hadoop - dummies

Apache Bigtop and Hadoop

By Dirk deRoos

To help you get started with Hadoop, here are instructions on how to quickly download and set up Hadoop on your own laptop computer. Your cluster will be running in pseudo-distributed mode on a virtual machine, so you won’t need special hardware.

A virtual machine (VM) is a simulated computer that you can run on a real computer. For example, you can run a program on your laptop that “plays” a VM, which opens a window that looks like it’s running another computer. In effect, a pretend computer is running inside your real computer.

You’ll be downloading a VM, and while running it, you’ll install Hadoop.

Apache Bigtop is a great alternative if you want to assemble your own Hadoop components. Bigtop gathers the core Hadoop components for you and ensures that your configuration works. Apache Bigtop is a 100 percent open source distribution.

The primary goal of Bigtop — itself an Apache project, just like Hadoop — is to build a community around the packaging, deployment, and integration of projects in the Apache Hadoop ecosystem. The focus is on the system as a whole rather than on individual projects.

Using Bigtop, you can easily install and deploy Hadoop components without having to track them down in a specific distribution and match them with a specific Hadoop version. As new versions of Hadoop components are released, they sometimes do not work with the newest releases of other projects. If you’re on your own, significant testing is required.

With Bigtop (or a commercial Hadoop release) you can trust that Hadoop experts have done this testing for you. To give you an idea of how expansive Bigtop has gotten, see the following list of all the components included in Bigtop:

  • Apache Crunch

  • Apache Flume

  • Apache Giraph

  • Apache HBase

  • Apache HCatalog

  • Apache Hive

  • Apache Mahout

  • Apache Oozie

  • Apache Pig

  • Apache Solr

  • Apache Sqoop

  • Apache Whirr

  • Apache Zookeeper

  • Cloudera Hue

  • LinkedIn DataFu

This collection of Hadoop ecosystem projects is about as expansive as it gets, as both major and minor projects are included. Apache Bigtop is continuously evolving.