Big Data: Management

View:  
Sorted by:  

Managing Big Data with Hadoop: HDFS and MapReduce

Hadoop, an open-source software framework, uses HDFS (the Hadoop Distributed File System) and MapReduce to analyze big data on clusters of commodity hardware—that is, in a distributed computing environment [more…]

Hadoop Distributed File System (HDFS) for Big Data Projects

The Hadoop Distributed File System is a versatile, resilient, clustered approach to managing files in a big data environment. HDFS is not the final destination for files. Rather, it is a data service that [more…]

Hadoop MapReduce for Big Data

To fully understand the capabilities of Hadoop MapReduce, it’s important to differentiate between MapReduce (the algorithm) and an implementation of MapReduce [more…]

Manage Big Data Resources and Applications with Hadoop YARN

Job scheduling and tracking for big data are integral parts of Hadoop MapReduce and can be used to manage resources and applications. The early versions of Hadoop supported a rudimentary job and task tracking [more…]

Store Big Data with HBase

HBase is a distributed, nonrelational (columnar) database that utilizes HDFS as its persistence store for big data projects. It is modeled after Google BigTable and is capable of hosting very large tables [more…]

Analysis and Extraction Techniques for Big Data

In general, text analytics solutions for big data use a combination of statistical and Natural Language Processing (NLP) techniques to extract information from unstructured data. NLP is a broad and complex [more…]

Different Approaches to Big Data Analysis

In many cases, big data analysis will be represented to the end user through reports and visualizations. Because the raw data can be incomprehensively varied, you will have to rely on analysis tools and [more…]

Build a Big Data Foundation with the Hadoop Ecosystem

As core components, Hadoop MapReduce and HDFS are constantly being improved and provide starting points for big data, but you need something more. Trying to tackle big data challenges without a toolbox [more…]

Hadoop Pig and Pig Latin for Big Data

The power and flexibility of Hadoop for big data are immediately visible to software developers primarily because the Hadoop ecosystem was built by developers, for developers. However, not everyone is [more…]

Hadoop Sqoop for Big Data

Sqoop (SQL-to-Hadoop) is a big data tool that offers the capability to extract data from non-Hadoop data stores, transform the data into a form usable by Hadoop, and then load the data into HDFS. This [more…]

Hadoop Zookeeper for Big Data

Hadoop’s greatest technique for addressing big data challenges is its capability to divide and conquer with Zookeeper. After the problem has been divided, the conquering relies on the capability to employ [more…]

Hadoop Administration Commands

Any Hadoop administrator worth his salt must master a comprehensive set of commands for cluster administration. The following table summarizes the most important commands. Know them, and you will advance [more…]

The Hadoop dfsadmin Command Options

The dfsadmin tools are a specific set of tools designed to help you root out information about your Hadoop Distributed File system (HDFS). As an added bonus, you can use them to perform some administration [more…]

Hadoop Distributed File System Shell Commands

The Hadoop shell is a family of commands that you can run from your operating system’s command line. The shell has two sets of commands: one for file manipulation [more…]

Hadoop For Dummies Cheat Sheet

Like many buzzwords, what people mean when they say “big data” is not always clear. At its core, big data is a way of describing data problems that are unsolvable using traditional tools —because of the [more…]

10 Emerging Hadoop Technologies to Keep Your Eye On

With Hadoop hitting mainstream IT with a vengeance, open source projects related to Hadoop are popping up everywhere. Here are the top ten most interesting emerging Hadoop projects for you to keep your [more…]

Graph Processing In Hadoop

One of the more exciting emerging NoSQL technologies involves the storage and processing of graph data. You might think that this statement is old news because computer scientists have been developing [more…]

Securing Your Data in Hadoop

As Hadoop enters the IT mainstream and starts getting used in a major way in production environments, the same security concerns that apply to IT systems such as databases will be applicable to Hadoop [more…]

The Origin and Design of Hadoop

So what exactly is this thing with the funny name — Hadoop? At its core, Hadoop is a framework for storing data on large clusters of commodity hardware — everyday computer hardware that is affordable and [more…]

Distributed Processing with Hadoop MapReduce

Hadoop MapReduce involves the processing of a sequence of operations on distributed data sets. The data consists of key-value pairs, and the computations have only two phases: a map phase and a reduce [more…]

The Apache Hadoop Ecosystem

Hadoop is more than MapReduce and HDFS (Hadoop Distributed File System): It’s also a family of related projects (an ecosystem, really) for distributed computing and large-scale data processing. Most [more…]

Comparing Hadoop Distributions

You’ll find that the Hadoop ecosystem has many component parts, all of which exist as their own Apache projects. Because Hadoop has grown considerably, and faces some significant further changes, different [more…]

The Keys to Successfully Adopting Hadoop

In any serious Hadoop project, you should start by teaming IT with business leaders from VPs on down to help solve your business’s pain points — those problems [more…]

Log Data Analysis with Hadoop

Log analysis is a common use case for an inaugural Hadoop project. Indeed, the earliest uses of Hadoop were for the large-scale analysis of clickstream [more…]

Data Warehouse Modernization with Hadoop

Data warehouses are now under stress, trying to cope with increased demands on their finite resources. Hadoop can provide significant relief in this data warehouse situation. [more…]

Sign Up for RSS Feeds

Computers & Software
Great Gadget Giveaway -- Enter to Win!

Inside Dummies.com