Put Big Data to Use
Manage Virtualization for Big Data
RDBMSs in a Big Data Environment

Layer 3 of the Big Data Stack: Organizing Data Services and Tools

Organizing data services and tools, layer 3 of the big data stack, capture, validate, and assemble various big data elements into contextually relevant collections. Because big data is massive, techniques have evolved to process the data efficiently and seamlessly. MapReduce is one heavily used technique. Suffice it to say here that many of these organizing data services are MapReduce engines, specifically designed to optimize the organization of big data streams.

Organizing data services are, in reality, an ecosystem of tools and technologies that can be used to gather and assemble data in preparation for further processing. As such, the tools need to provide integration, translation, normalization, and scale. Technologies in this layer include the following:

  • A distributed file system: Necessary to accommodate the decomposition of data streams and to provide scale and storage capacity

  • Serialization services: Necessary for persistent data storage and multilanguage remote procedure calls (RPCs)

  • Coordination services: Necessary for building distributed applications (locking and so on)

  • Extract, transform, and load (ETL) tools: Necessary for the loading and conversion of structured and unstructured data into Hadoop

  • Workflow services: Necessary for scheduling jobs and providing a structure for synchronizing process elements across layers

  • Add a Comment
  • Print
  • Share
blog comments powered by Disqus
Nonrelational Databases in a Big Data Environment
Big Data and the Origins of MapReduce
How to Make Use of the Cloud for Big Data
Characteristics of a Big Data Analysis Framework
Modify Business Intelligence Products to Handle Big Data

Inside Dummies.com