Basics of Big Data Infrastructure
Part of the Big Data For Dummies Cheat Sheet
Big data is all about high velocity, large volumes, and wide data variety, so the physical infrastructure will literally "make or break" the implementation. Most big data implementations need to be highly available, so the networks, servers, and physical storage must be resilient and redundant.
Resiliency and redundancy are interrelated. An infrastructure, or a system, is resilient to failure or changes when sufficient redundant resources are in place ready to jump into action. Resiliency helps to eliminate single points of failure in your infrastructure. For example, if only one network connection exists between your business and the Internet, you have no network redundancy, and the infrastructure is not resilient with respect to a network outage.
In large data centers with business continuity requirements, most of the redundancy is in place and can be leveraged to create a big data environment. In new implementations, the designers have the responsibility to map the deployment to the needs of the business based on costs and performance.