Big Data and Polyglot Persistence

By Judith Hurwitz, Alan Nugent, Fern Halper, Marcia Kaufman

The term polyglot is borrowed and redefined for big data as a set of applications that use several core database technologies, and this is the most likely outcome of your implementation planning. The official definition of polyglot is “someone who speaks or writes several languages.” It is going to be difficult to choose one persistence style no matter how narrow your approach to big data might be.

A polyglot persistence database is used when it is necessary to solve a complex problem by breaking that problem into segments and applying different database models. It is then necessary to aggregate the results into a hybrid data storage and analysis solution. A number of factors affect this decision:

  • You are already using polyglot persistence in your existing workplace. If your enterprise or organization is large, you are probably using multiple RDBMSs, data warehouses, data marts, flat files, content management servers, and so on.

    This hybrid environment is common, and you need to understand it so that you can make the right decisions about integration, analytics, timeliness of data, data visibility, and so on. You need to understand all of that because you need to figure out how it is going to fit into your big data implementation.

  • The most ideal of environments, where you have only one persistence technology, is probably not suited to big data problem solving. At the very least, you will need to introduce another style of database and other supporting technologies for your new implementation.

  • Depending on the variety and velocity of your big data gathering, you may need to consider different databases to support one implementation. You should also consider your requirements for transactional integrity. Do you need to support ACID compliance or will BASE compliance be sufficient?

Suppose that you need to identify all the customers for your product who have purchased in the last 12 months and have commented on social websites about their experience — AND whether they have had any support cases, where they acquired the product, how it was delivered, what they paid, how they paid, whether they have been to the company website, how many times, what they did, and so on.

Then suppose that you want to offer them a promotional discount to their smartphone when they are entering one of your (or one of your partners’) retail stores.

This is a big data challenge at its best. Multiple sources of data with very different structures need to be collected and analyzed so that you can get the answers to these questions. Then you need determine whether the customers qualify for the promotion and, in real time, push them a coupon offering them something new and interesting.

This type of problem cannot be solved easily or cost-effectively with one type of database technology. Even though some of the basic information is transactional and probably in an RDBMS, the other information is nonrelational and will require at least two types of persistence engines (spatial and graph). You now have polyglot persistence.