Apache Drill - dummies

By Dirk deRoos

Apache Drill is a candidate project in the Apache incubator. Apache Drill isn’t especially sickly, though. The Apache Software Foundation (ASF) candidate technologies all begin as incubator projects before becoming official ASF technologies. You can read about the Apache Incubator. You can read about Drill.

Inspired by Google’s Dremel technology, the stated performance goal for Drill is to enable SQL queries against a petabyte or more of data distributed across 10,000-plus servers. The figure illustrates the architecture of Apache Drill.


You can see that the key to the Drill architecture are the DrillBit servers deployed on each data node. Note that each server includes a query parser, compiler, optimizer, and runtime, but there is a master DrillBit server nominated by Zookeeper servers, which oversees the execution of the queries and looks after the task of pulling together the interim result sets into a single set of output.

Like Dremel, Drill can coexist with, and complement, MapReduce, but MapReduce isn’t used to fulfill queries, as with Apache Hive. Instead, execution engines called Drillbits have been developed by members of the Drill community.

This community aims to provide low-latency queries for applications such as real-time business intelligence dashboards, fraud detection, and other time-sensitive use cases. Drill supports nested data types such as Avro, JSON, and Google protocol buffers. These nested data types allow for very large denormalized tables.

The Drill development team is also working on providing extensive SQL support by targeting SQL2003 compliance. Finally, note that the Drill team is providing HBase support so that users will be able to query HBase tables with SQL.