Put Your Big Data Together

By Judith Hurwitz, Alan Nugent, Fern Halper, Marcia Kaufman

How will you know how to put all of your data together? With a big data project, what you want to do with your structured and unstructured data indicates why you might choose one piece of technology over another one. It also determines the need to understand inbound data structures to put this data in the right place.

Manage different data types for big data

You will need to consider some of the characteristics of big data and the types of data management systems you might want to use to address each one.


Integrate data types into a big data environment

Another important aspect of big data is that you often don’t need to own all the data that you will use. Many examples make the point. You may be leveraging social media data, data coming from third-party industry statistics, or even data coming from satellites. Just think about social media and you’ll understand.

Oftentimes, it becomes necessary to integrate different sources. This data may be coming from all internal systems, from both internal and external sources, or from entirely external sources. Much of this data may have been siloed before.

Data need not be coming to you in real time. You just may have a lot of it and it is disparate in nature. This could still qualify as a big data problem. Of course, you could also be faced with a scenario where you’re seeing huge volumes of data, at high velocities, and it is disparate in nature.

The point is that you won’t get the business value if you deal with a variety of data sources as a set of disconnected silos of information.

Components you need include connectors and metadata.


You want to have some connectors that enable you to pull data in from various big data sources. Maybe you want a Twitter connector or a Facebook one. Maybe you need to integrate from your data warehouse with a big data source that’s off your premises so that you can analyze both of these sources of data together.


A critical component to integrating all this data is the metadata. Metadata is the definitions, mappings, and other characteristics used to describe how to find, access, and use a company’s data (and software) components. One example of metadata is data about an account number. This might include the number, description, data type, name, address, phone number, and privacy level.

Metadata can be used to help you organize your data stores and deal with new and changing sources of data. Although the idea of metadata is not new, it is changing and evolving in the context of big data. In the traditional metadata world, it is important to have a catalog that provides a single view of all data sources.

But this catalog will have to be different when you don’t control all these data sources. You may need an analytic tool that will help you understand the underlying metadata.