Big Data: Management

Sorted by:  

How to Launch a MapReduce Application in Hadoop 1

To see how the JobTracker and TaskTracker work together to carry out a MapReduce action, take a look at the execution of a MapReduce application. The figure shows the interactions, and the following step [more…]

The YARN Architecture in Hadoop

YARN, for those just arriving at this particular party, stands for Yet Another Resource Negotiator, a tool that enables other data processing frameworks to run on Hadoop. The glory of YARN is that it presents [more…]

YARN’s Resource Manager

The core component of YARN (Yet Another Resource Negotiator) is the Resource Manager, which governs all the data processing resources in the Hadoop cluster. Simply put, the Resource Manager is a dedicated [more…]

YARN’s Node Manager in Hadoop

Each slave node in Yet Another Resource Negotiator (YARN) has a Node Manager daemon, which acts as a slave for the Resource Manager. As with the TaskTracker, each slave node has a service that ties it [more…]

YARN’s Application Master in Hadoop

Unlike other YARN (Yet Another Resource Negotiator) components, no component in Hadoop 1 maps directly to the Application Master. In essence, this is work that the JobTracker did for every application, [more…]

How to Launch a YARN-Based Application

To show how the various YARN (Yet Another Resource Negotiator) components work together, you can walk through the execution of an application. For the sake of argument, it can be a MapReduce application [more…]

Real-Time and Streaming Applications in Hadoop

The process flow of YARN looks an awful lot like a framework for batch execution. You might wonder, “What happened to this idea of flexibility for different modes of applications?” Well, the only application [more…]

The Pig Architecture in Hadoop

“Simple” often means “elegant” when it comes to those architectural drawings for that new Silicon Valley mansion you have planned for when the money starts rolling in after you implement Hadoop. The same [more…]

The Pig Latin Application Flow in Hadoop

At its core, Pig Latin is a dataflow language, where you define a data stream and a series of transformations that are applied to the data as it flows through your application. This is in contrast to a [more…]

Pig Latin in Hadoop’s Pig Programs

Pig Latin is the language for Pig programs. Pig translates the Pig Latin script into MapReduce jobs that it can be executed within Hadoop cluster. When coming up with Pig Latin, the development team followed [more…]

Hadoop’s Pig Data Types and Syntax

Pig’s data types make up the data model for how Pig thinks of the structure of the data it is processing. With Pig, the data model gets defined when the data is loaded. Any data you load into Pig from [more…]

Pig Script Interfaces in Hadoop

The Pig programming language is designed to handle any kind of data tossed its way — structured, semi-structured, unstructured data, you name it. Pig programs can be packaged in three different ways: [more…]

Scripting with Pig Latin in Hadoop

Hadoop is a rich and quickly evolving ecosystem with a growing set of new applications. Rather than try to keep up with all the requirements for new capabilities, Pig is designed to be extensible via [more…]

The Limitations of Sampling in Hadoop

Statistical analytics is far from being a new kid on the block, and it is certainly old news that it depends on processing large amounts of data to gain new insight. However, the amount of data that’s [more…]

Factors That Increase the Scale of Statistical Analysis in Hadoop

The reason people sample their data before running statistical analysis in Hadoop is that this kind of analysis often requires significant computing resources. This isn’t just about data volumes: there [more…]

Running Statistical Models in Hadoop’s MapReduce

Converting statistical models to run in parallel is a challenging task. In the traditional paradigm for parallel programming, memory access is regulated through the use of [more…]

Machine Learning with Mahout in Hadoop

Machine learning refers to a branch of artificial intelligence techniques that provides tools enabling computers to improve their analysis based on previous events. These computer systems leverage historical [more…]

R on Hadoop and the R Language

The machine learning discipline has a rich and extensive catalogue of techniques. Mahout brings a range of statistical tools and algorithms to the table, but it only captures a fraction of those techniques [more…]

Hadoop Integration with R

In the beginning, big data and R were not natural friends. R programming requires that all objects be loaded into the main memory of a single machine. The limitations of this architecture are quickly realized [more…]

How to Get Apache Oozie Set Up in Hadoop

Apache Oozie is included in every major Hadoop distribution, including Apache Bigtop. In your Hadoop cluster, install the Oozie server on an edge node, where you would also run other client applications [more…]

Configuring Oozie Workflows

As a workflow engine, Oozie enables you to run a set of Hadoop applications in a specified sequence known as a workflow. You can configure Oozie workflows in one of three ways, depending on your particular [more…]

Running Oozie Workflows in Hadoop

Before running your Oozie workflows, all its components need to exist within a specified directory structure. Specifically, the workflow itself should have its own, dedicated directory, where workflow. [more…]

Developing Oozie Workflows in Hadoop

Oozie workflows are, at their core, directed graphs, where you can define actions (Hadoop applications) and data flow, but with no looping — meaning you can’t define a structure where you’d run a specific [more…]

The Reduce Phase of Hadoop’s MapReduce Application Flow

The Reduce phase processes the keys and their individual lists of values so that what’s normally returned to the client application is a set of key/value pairs. Here’s the blow-by-blow so far: A large [more…]

Local and Distributed Modes of Running Pig Scripts in Hadoop

Before you can run your first Pig script in Hadoop, you need to have a handle on how Pig programs can be packaged with the Pig server.

Pig has two modes for running scripts: [more…]


Sign Up for RSS Feeds

Computers & Software
Win $500. Enter Now.