Big Data Engineering

Sorted by:  

Integrate Big Data with the Traditional Data Warehouse

While the worlds of big data and the traditional data warehouse will intersect, they are unlikely to merge anytime soon. Think of a data warehouse as a system of record for business intelligence, much [more…]

Basics of Big Data Infrastructure

Big data is all about high velocity, large volumes, and wide data variety, so the physical infrastructure will literally "make or break" the implementation. Most big data implementations need to be highly [more…]

Structured Data in a Big Data Environment

The term structured data generally refers to data that has a defined length and format for big data. Examples of structured data include numbers, dates, and groups of words and numbers called [more…]

Unstructured Data in a Big Data Environment

Unstructured data is data that does not follow a specified format for big data. If 20 percent of the data available to enterprises is structured data, the other 80 percent is unstructured. Unstructured [more…]

Layer 0 of the Big Data Stack: Redundant Physical Infrastructure

At the lowest level of the big data stack is the physical infrastructure. Your company might already have a data center or made investments in physical infrastructures, so you’re going to want to find [more…]

Layer 1 of the Big Data Stack: Security Infrastructure

Security and privacy requirements, layer 1 of the big data stack, are similar to the requirements for conventional data environments. The security requirements have to be closely aligned to specific business [more…]

Layer 2 of the Big Data Stack: Operational Databases

At the core of any big data environment, and layer 2 of the big data stack, are the database engines containing the collections of data elements relevant to your business. These engines need to be fast [more…]

Layer 3 of the Big Data Stack: Organizing Data Services and Tools

Organizing data services and tools, layer 3 of the big data stack, capture, validate, and assemble various big data elements into contextually relevant collections. Because big data is massive, techniques [more…]

Layer 4 of the Big Data Stack: Analytical Data Warehouses

The data warehouse, layer 4 of the big data stack, and its companion the data mart, have long been the primary techniques that organizations use to optimize data to help decision makers. Typically, data [more…]

Manage Virtualization for Big Data

Virtualization separates resources and services from the underlying physical delivery environment, enabling you to create many virtual systems within a single physical system. One of the primary reasons [more…]

Big Data Cloud Deployment Models

Two key cloud models are important in the discussion of big data — public clouds and private clouds. Cloud computing is a method of providing a set of shared computing resources that include applications [more…]

Why the Cloud is Imperative for Big Data

Numerous combinations of deployment and delivery models exist for big data in the cloud. For example, you can utilize a public cloud IaaS or a private cloud IaaS. So, what does this mean for big data and [more…]

Big Data Cloud Providers

Cloud providers come in all shapes and sizes and offer many different products for big data. Some are household names while others are recently emerging. Some of the cloud providers that offer IaaS services [more…]

Warnings for Big Data Cloud Users

Warning! Cloud-based services can provide an economical solution to your big data needs, but the cloud has its issues. It’s important to do your homework before moving your big data there. Here are some [more…]

Nonrelational Databases in a Big Data Environment

Nonrelational databases do not rely on the table/key model endemic to RDBMSs (relational database management systems). In short, specialty data in the big data world requires specialty persistence and [more…]

Key-Value Pair Databases in a Big Data Environment

By far, the simplest of the NoSQL (not-only-SQL) databases in a big data environment are those employing the key-value pair (KVP) model. KVP databases do not require a schema [more…]

Document Databases in a Big Data Environment

You find two kinds of document databases for big data projects. One is often described as a repository for full document-style content. The other is a database for storing document components for permanent [more…]

Columnar Databases in a Big Data Environment

Columnar databases can be very helpful in your big data project. Relational databases are row oriented, as the data in each row of a table is stored together. In a columnar, or column-oriented database [more…]

Graph Databases in a Big Data Environment

The fundamental structure for graph databases in big data is called “node-relationship.” This structure is most useful when you must deal with highly interconnected data. Nodes and relationships support [more…]

Spatial Databases in a Big Data Environment

Spatial databases can be an important tool in your big data project. Spatial data itself is standardized through the efforts of the Open Geospatial Consortium [more…]

How to Use MapReduce for Big Data

MapReduce is a software framework that is ideal for big data because it enables developers to write programs that can process massive amounts of unstructured data in parallel across a distributed group [more…]

How to Optimize MapReduce Tasks

Aside from optimizing the actual application code with MapReduce for big data projects, you can use some optimization techniques to improve the reliability and performance. They fall into three categories [more…]

Mine Big Data with Hive

Hive is a batch-oriented, data-warehousing layer built on the core elements of Hadoop (HDFS and MapReduce) and is very useful in big data. It provides users who know SQL with a simple SQL-lite implementation [more…]

Big Data Analysis and the Data Warehouse

You will find value in bringing the capabilities of the data warehouse and the big data environment together. You need to create a hybrid environment where big data can work hand in hand with the data [more…]

Modify Business Intelligence Products to Handle Big Data

Traditional business intelligence products weren’t really designed to handle big data, so they may require some modification. They were designed to work with highly structured, well-understood data, often [more…]

Sign Up for RSS Feeds

Computers & Software