Big Data: Management

Sorted by:  

When Does HBase Make Sense for You?

So, when should you consider using HBase? Though the answer to this question isn’t necessarily straightforward for everyone, for starters you clearly must have a big data requirement and sufficient hardware [more…]

Transitioning from an RDBMS model to HBase

If you’re facing the design phase for your application and you believe that HBase would be a good fit, then designing your row keys and schema to fit the HBase data model and architecture is the right [more…]

Hardware Requirements for HBase

HBase is a powerful and flexible technology, but accompanying this flexibility is the requirement for proper configuration and tuning. It’s time for some general guidelines for configuring HBase clusters [more…]

HBase Tuning Prerequisites

Any serious HBase installation requires some standard setup on your cluster and on your individual nodes. A few examples are provided here. First take a look at monitoring and management. [more…]

Hadoop and Hive

To make a long story short, Hive provides Hadoop with a bridge to the RDBMS world and provides an SQL dialect known as Hive Query Language (HiveQL), which can be used to perform SQL-like tasks. That’s [more…]

The Architecture of Apache Hive

As you examine the elements of Apache Hive shown, you can see at the bottom that Hive sits on top of the Hadoop Distributed File System (HDFS) and MapReduce systems. [more…]

How to Get Started with Apache Hive

There’s no better way to see what’s what than to install the Hive software and give it a test run. As with other technologies in the Hadoop ecosystem, it doesn’t take long to get started. [more…]

The Hive CLI Client

The first Hive client is the Hive command-line interface (CLI). To master the finer points of the Hive CLI client, it might help to review the (somewhat busy-looking) Hive architecture. [more…]

The Web Browser as Hive Client

Using the Hive CLI requires only one command to start the Hive shell, but when you want to access Hive using a web browser, you first need to start the HWI Server and then point your browser to the port [more…]

SQuirreL as Hive Client with the JDBC Driver

SQuirreL SQL is an open source tool that acts as a Hive client. You can download this universal SQL client from the SourceForge website. It provides a user interface to Hive and simplifies the tasks of [more…]

Defining Table Record Formats in Hive

The Java technology that Hive uses to process records and map them to column data types in Hive tables is called SerDe, which is short for SerializerDe [more…]

Hive INSERT Command Examples

One Hive DML command to explore is the INSERT command. You basically have three INSERT variants; two of them are shown in the following listing. To demonstrate this new DML command, you will create a new [more…]

How to Use Hive’s Create Table As Select (CTAS)

In the Hive DML example shown here, the powerful technique in Hive known as Create Table As Select, or CTASis illustrated. Its constructs allow you to quickly derive Hive tables from other tables as you [more…]

Joining Tables with Hive

You probably know already that experts in relational database modeling and design typically spend a lot of their time designing normalized databases, or [more…]

Improving Your Hive Queries with Indexes

Creating an index is common practice with relational databases when you want to speed access to a column or set of columns in your database. Without an index, the database system has to read all rows in [more…]

Windowing in HiveQL

The concept of windowing, introduced in the SQL:2003 standard, allows the SQL programmer to create a frame from the data against which aggregate and other window functions can operate. HiveQL now supports [more…]

Key HiveQL Features

The vibrant and active Apache Hive community continually adds to an already extensive feature set, which makes exhaustive coverage even more difficult. The following list summarizes some key HiveQL features [more…]

The Principles of Sqoop Design

When it comes to Sqoop, a picture is often worth a thousand words, so check out the figure, which gives you a bird’s-eye view of the Sqoop architecture. [more…]

Sqoop Connectors and Drivers

Sqoop connectors generally go hand in hand with a JDBC driver. Sqoop does not bundle the JDBC drivers because they are usually proprietary and licensed by the RDBMS or DW vendor. So there are three possible [more…]

Importing Data with Sqoop

Ready to dive into importing data with Sqoop? Start by taking a look at the figure, which illustrates the steps in a typical Sqoop import operation from an RDBMS or a data warehouse system. Nothing too [more…]

Importing Data into HDFS with Sqoop

Imagine a relational database used by a fictional service company that has been taking (you guessed it) Apache Hadoop service calls and now wants to move some of its data onto Hadoop to run Hive queries [more…]

Importing Data into Hive with Sqoop

Here, you import all of the Service Order Database directly from MySQL into Hive and run a HiveQL query against the newly imported database on Apache Hadoop. The following listing shows you how it’s done [more…]

Importing Data into HBase with Sqoop

Sqoop can be used to transform a relational database schema into an HBase schema. Of course, the main goal here is to demonstrate how Sqoop can import data from an RDBMS or data warehouse directly into [more…]

Sqoop Exports Using the Update and Update Insert Approach

With insert mode, records exported by Sqoop are appended to the end of the target table. Sqoop also provides an update mode that you can use by providing the [more…]

Sqoop 2.0 Preview

With all the success surrounding Sqoop 1.x upon its graduation from the Apache incubator, Sqoop has momentum! So, as you might expect, Sqoop 2.0 is in the works with exciting new features on the way. You [more…]

Sign Up for RSS Feeds

Computers & Software