Big Data For Dummies
Book image
Explore Book Buy On Amazon

Organizing data services and tools, layer 3 of the big data stack, capture, validate, and assemble various big data elements into contextually relevant collections. Because big data is massive, techniques have evolved to process the data efficiently and seamlessly. MapReduce is one heavily used technique. Suffice it to say here that many of these organizing data services are MapReduce engines, specifically designed to optimize the organization of big data streams.

Organizing data services are, in reality, an ecosystem of tools and technologies that can be used to gather and assemble data in preparation for further processing. As such, the tools need to provide integration, translation, normalization, and scale. Technologies in this layer include the following:

  • A distributed file system: Necessary to accommodate the decomposition of data streams and to provide scale and storage capacity

  • Serialization services: Necessary for persistent data storage and multilanguage remote procedure calls (RPCs)

  • Coordination services: Necessary for building distributed applications (locking and so on)

  • Extract, transform, and load (ETL) tools: Necessary for the loading and conversion of structured and unstructured data into Hadoop

  • Workflow services: Necessary for scheduling jobs and providing a structure for synchronizing process elements across layers

About This Article

This article is from the book:

About the book authors:

Judith Hurwitz is an expert in cloud computing, information management, and business strategy. Alan Nugent has extensive experience in cloud-based big data solutions. Dr. Fern Halper specializes in big data and analytics. Marcia Kaufman specializes in cloud infrastructure, information management, and analytics.

This article can be found in the category: