Performance and Big Data

Judith S. Hurwitz

Alan Nugent

Fern Halper

Marcia Kaufman

Updated

2016-03-26 15:03:45

From the book

Big Data For Dummies

Download E-Book

Statistics for Big Data For Dummies

Explore Book

Download E-Book

Statistics for Big Data For Dummies

Explore Book

Just having a faster computer isn’t enough to ensure the right level of performance to handle big data. You need to be able to distribute components of your big data service across a series of nodes. In distributed computing, a node is an element contained within a cluster of systems or within a rack.

A node typically includes CPU, memory, and some kind of disk. However, a node can also be a blade CPU and memory that rely on nearby storage within a rack.

Within a big data environment, these nodes are typically clustered together to provide scale. For example, you might start out with a big data analysis and continue to add more data sources. To accommodate the growth, an organization simply adds more nodes into a cluster so that it can scale out to accommodate growing requirements.

However, it isn’t enough to simply expand the number of nodes in the cluster. Rather, it is important to be able to send part of the big data analysis to different physical environments. Where you send these tasks and how you manage them makes the difference between success and failure.

In some complex situations, you may want to execute many different algorithms in parallel, even within the same cluster, to achieve the speed of analysis required. Why would you execute different big data algorithms in parallel within the same rack? The closer together the distributions of functions are, the faster they can execute.

Although it is possible to distribute big data analysis across networks to take advantage of available capacity, you must do this type of distribution based on requirements for performance. In some situations, the speed of processing takes a back seat. However, in other situations, getting results fast is the requirement. In this situation, you want to make sure that the networking functions are in close proximity to each other.

In general, the big data environment has to be optimized for the type of analytics task. Therefore, scalability is the lynchpin of making big data operate successfully. Although it would be theoretically possible to operate a big data environment within a single large environment, it is not practical.

To understand the needs for scalability in big data, one only has to look at cloud scalability and understand both the requirements and the approach. Like cloud computing, big data requires the inclusion of fast networks and inexpensive clusters of hardware that can be combined in racks to increase performance. These clusters are supported by software automation that enables dynamic scaling and load balancing.

The design and implementations of MapReduce are excellent examples of how distributed computing can make big data operationally visible and affordable. In essence, companies are at one of the unique turning points in computing where technology concepts come together at the right time to solve the right problems. Combining distributed computing, improved hardware systems, and practical solutions such as MapReduce and Hadoop is changing data management in profound ways.

About This Article

About the book author:

Judith Hurwitz is an expert in cloud computing, information management, and business strategy.

Alan Nugent has extensive experience in cloud-based big data solutions.

Dr. Fern Halper specializes in big data and analytics.

Marcia Kaufman specializes in cloud infrastructure, information management, and analytics.

This article can be found in the category:

Big Data

Hot off the press

Explore Related content

Statistics for Big Data For Dummies

Big Data For Dummies

Big Data For Small Business For Dummies

Book & Article Categories

Book & Article Categories

Collections

Performance and Big Data

About This Article

About the book author:

This article can be found in the category:

Explore Related content

Book & Article Categories

Book & Article Categories

Collections

Performance and Big Data

About This Article

This article is from the book:

About the book author:

This article can be found in the category:

Explore Related content

Beyond Boundaries: Unstructured Data Orchestration

Big Data For Dummies Cheat Sheet

Statistics for Big Data For Dummies Cheat Sheet

Big Data for Small Business For Dummies Cheat Sheet

Integrate Big Data with the Traditional Data Warehouse

Best Practices for Big Data Integration

How to Analyze Big Data to Get Results

Big Data Planning Stages

Ten Hot Big Data Trends

Explore the Big Data Stack

Defining Big Data: Volume, Velocity, and Variety

Understanding Unstructured Data

Basics of Big Data Infrastructure

The Role of Traditional Operational Data in the Big Data Environment

Laying the Groundwork for Your Big Data Strategy

Managing Big Data with Hadoop: HDFS and MapReduce

Identify the Data You Need for Your Big Data

Layer 2 of the Big Data Stack: Operational Databases

Manage Virtualization for Big Data

Layer 4 of the Big Data Stack: Analytical Data Warehouses