Online Test Banks
Score higher
See Online Test Banks
eLearning
Learning anything is easy
Browse Online Courses
Mobile Apps
Learning on the go
Explore Mobile Apps
Dummies Store
Shop for books and more
Start Shopping

Big Data Management with Hadoop

View:  
Sorted by:  

Managing Big Data with Hadoop: HDFS and MapReduce

Hadoop, an open-source software framework, uses HDFS (the Hadoop Distributed File System) and MapReduce to analyze big data on clusters of commodity hardware—that is, in a distributed computing environment [more…]

Hadoop Distributed File System (HDFS) for Big Data Projects

The Hadoop Distributed File System is a versatile, resilient, clustered approach to managing files in a big data environment. HDFS is not the final destination for files. Rather, it is a data service that [more…]

Hadoop MapReduce for Big Data

To fully understand the capabilities of Hadoop MapReduce, it’s important to differentiate between MapReduce (the algorithm) and an implementation of MapReduce [more…]

Manage Big Data Resources and Applications with Hadoop YARN

Job scheduling and tracking for big data are integral parts of Hadoop MapReduce and can be used to manage resources and applications. The early versions of Hadoop supported a rudimentary job and task tracking [more…]

Build a Big Data Foundation with the Hadoop Ecosystem

As core components, Hadoop MapReduce and HDFS are constantly being improved and provide starting points for big data, but you need something more. Trying to tackle big data challenges without a toolbox [more…]

Hadoop Pig and Pig Latin for Big Data

The power and flexibility of Hadoop for big data are immediately visible to software developers primarily because the Hadoop ecosystem was built by developers, for developers. However, not everyone is [more…]

Hadoop Sqoop for Big Data

Sqoop (SQL-to-Hadoop) is a big data tool that offers the capability to extract data from non-Hadoop data stores, transform the data into a form usable by Hadoop, and then load the data into HDFS. This [more…]

Hadoop Zookeeper for Big Data

Hadoop’s greatest technique for addressing big data challenges is its capability to divide and conquer with Zookeeper. After the problem has been divided, the conquering relies on the capability to employ [more…]

Hadoop Administration Commands

Any Hadoop administrator worth his salt must master a comprehensive set of commands for cluster administration. The following table summarizes the most important commands. Know them, and you will advance [more…]

The Hadoop dfsadmin Command Options

The dfsadmin tools are a specific set of tools designed to help you root out information about your Hadoop Distributed File system (HDFS). As an added bonus, you can use them to perform some administration [more…]

Hadoop Distributed File System Shell Commands

The Hadoop shell is a family of commands that you can run from your operating system’s command line. The shell has two sets of commands: one for file manipulation [more…]

Hadoop For Dummies Cheat Sheet

Like many buzzwords, what people mean when they say “big data” is not always clear. At its core, big data is a way of describing data problems that are unsolvable using traditional tools —because of the [more…]

10 Emerging Hadoop Technologies to Keep Your Eye On

With Hadoop hitting mainstream IT with a vengeance, open source projects related to Hadoop are popping up everywhere. Here are the top ten most interesting emerging Hadoop projects for you to keep your [more…]

Graph Processing In Hadoop

One of the more exciting emerging NoSQL technologies involves the storage and processing of graph data. You might think that this statement is old news because computer scientists have been developing [more…]

Securing Your Data in Hadoop

As Hadoop enters the IT mainstream and starts getting used in a major way in production environments, the same security concerns that apply to IT systems such as databases will be applicable to Hadoop [more…]

The Origin and Design of Hadoop

So what exactly is this thing with the funny name — Hadoop? At its core, Hadoop is a framework for storing data on large clusters of commodity hardware — everyday computer hardware that is affordable and [more…]

Distributed Processing with Hadoop MapReduce

Hadoop MapReduce involves the processing of a sequence of operations on distributed data sets. The data consists of key-value pairs, and the computations have only two phases: a map phase and a reduce [more…]

The Apache Hadoop Ecosystem

Hadoop is more than MapReduce and HDFS (Hadoop Distributed File System): It’s also a family of related projects (an ecosystem, really) for distributed computing and large-scale data processing. Most [more…]

Comparing Hadoop Distributions

You’ll find that the Hadoop ecosystem has many component parts, all of which exist as their own Apache projects. Because Hadoop has grown considerably, and faces some significant further changes, different [more…]

The Keys to Successfully Adopting Hadoop

In any serious Hadoop project, you should start by teaming IT with business leaders from VPs on down to help solve your business’s pain points — those problems [more…]

Log Data Analysis with Hadoop

Log analysis is a common use case for an inaugural Hadoop project. Indeed, the earliest uses of Hadoop were for the large-scale analysis of clickstream [more…]

Data Warehouse Modernization with Hadoop

Data warehouses are now under stress, trying to cope with increased demands on their finite resources. Hadoop can provide significant relief in this data warehouse situation. [more…]

Fraud Detection with Hadoop

The sheer volume of transactions makes it harder to spot fraud because of the volume of data, ironically, this same challenge can help create better fraud predictive models — an area where Hadoop shines [more…]

Risk Modeling with Hadoop

Risk modeling is another major use case that’s energized by Hadoop. You’ll find that it closely matches the use case of fraud detection in that it’s a model-based discipline. The more data you have and [more…]

Social Sentiment Analysis with Hadoop

Social sentiment analysis is easily the most overhyped of the Hadoop uses, which should be no surprise, given that the world is constantly connected and the current expressive population. This use case [more…]

Sign Up for RSS Feeds

Computers & Software
Win $500. Easy. Enter to win now.

Inside Dummies.com

Dummies.com Sweepstakes

Win $500. Easy.