Advertisement
Online Test Banks
Score higher
See Online Test Banks
eLearning
Learning anything is easy
Browse Online Courses
Mobile Apps
Learning on the go
Explore Mobile Apps
Dummies Store
Shop for books and more
Start Shopping

Managing Big Data with Hadoop: HDFS and MapReduce

Part of the Big Data For Dummies Cheat Sheet

Hadoop, an open-source software framework, uses HDFS (the Hadoop Distributed File System) and MapReduce to analyze big data on clusters of commodity hardware—that is, in a distributed computing environment.

The Hadoop Distributed File System (HDFS) was developed to allow companies to more easily manage huge volumes of data in a simple and pragmatic way. Hadoop allows big problems to be decomposed into smaller elements so that analysis can be done quickly and cost effectively. HDFS is a versatile, resilient, clustered approach to managing files in a big data environment.

HDFS is not the final destination for files. Rather it is a data "service" that offers a unique set of capabilities needed when data volumes and velocity are high.

MapReduce is a software framework that enables developers to write programs that can process massive amounts of unstructured data in parallel across a distributed group of processors. MapReduce was designed by Google as a way of efficiently executing a set of functions against a large amount of data in batch mode.

The "map" component distributes the programming problem or tasks across a large number of systems and handles the placement of the tasks in a way that balances the load and manages recovery from failures. After the distributed computation is completed, another function called "reduce" aggregates all the elements back together to provide a result. An example of MapReduce usage would be to determine how many pages of a book are written in each of 50 different languages.

  • Add a Comment
  • Print
  • Share
blog comments powered by Disqus
Advertisement
Advertisement

Inside Dummies.com

Dummies.com Sweepstakes

Win an iPad Mini. Enter to win now!