Clearly, the very nature of the cloud makes it an ideal computing environment for big data. So how might you use big data together with the cloud? Here are some examples:
IaaS in a public cloud: In this scenario, you would be using a public cloud provider’s infrastructure for your big data services because you don’t want to use your own physical infrastructure. IaaS can provide the creation of virtual machines with almost limitless storage and compute power. You can pick the operating system you want, and you have the flexibility to dynamically scale the environment to meet your needs.
An example might be using the Amazon Elastic Compute Cloud (Amazon EC2) service to run a real-time predictive model that requires data to be processed using massively parallel processing. It might be a service that processes big-box retail data. You might want to process billions of pieces of click-stream data for targeting customers with the right ad in real time.
PaaS in a private cloud: PaaS is an entire infrastructure packaged so that it can be used to design, implement, and deploy applications and services in a public or private cloud environment. PaaS enables an organization to leverage key middleware services without having to deal with the complexities of managing individual hardware and software elements.
PaaS vendors are beginning to incorporate big data technologies such as Hadoop and MapReduce into their PaaS offerings. For example, you might want to build a specialized application to analyze vast amounts of medical data. The application would make use of real-time as well as non-real-time data. It’s going to require Hadoop and MapReduce for storage and processing.
What’s great about PaaS in this scenario is how quickly the application can be deployed. You won’t have to wait for internal IT teams to get up to speed on the new technologies and you can experiment more liberally. Once you have identified a solid solution, you can bring it in house when IT is ready to support it.
SaaS in a hybrid cloud: Here you might want to analyze “voice of the customer” data from multiple channels. Many companies have come to realize that one of the most important data sources is what the customer thinks and says about their company, their products, and their services.
Getting access to voice of the customer data can provide invaluable insights into behaviors and actions. Increasingly, customers are “vocalizing” on public sites across the Internet. The value of the customers’ input can be greatly enhanced by incorporating this public data into your analysis.
Your SaaS vendor provides the platform for the analysis as well as the social media data. In addition, you might utilize your enterprise CRM data in your private cloud environment for inclusion in the analysis.
Some industry insiders are using the term big data applications when describing applications that run in the cloud that use big data. Examples of this include Amazon.com and LinkedIn. Now some people might argue (and have) that these are really SaaS applications that solve a particular business problem. It’s often a matter of semantics in an emerging space.