Different Approaches to Big Data Analysis

Statistics for Big Data For Dummies

In many cases, big data analysis will be represented to the end user through reports and visualizations. Because the raw data can be incomprehensively varied, you will have to rely on analysis tools and techniques to help present the data in meaningful ways.

New applications are coming available and will fall broadly into two categories: custom or semi-custom.

Custom applications for big data analysis

In general, a custom application is created for a specific purpose or a related set of purposes. For big data analysis, the purpose of custom application development is to speed up the time to decision or action.

R environment

The “R” environment is based on the “S” statistics and analysis language developed in the 1990s by Bell Laboratories. It is maintained by the GNU project and is available under the GNU license.

While challenging to fully comprehend, its depth and flexibility make it a compelling choice for analytics application developers and “power users.” In addition, the CRAN R project maintains a worldwide set of File Transfer Protocol and web servers with the most up-to-date versions of the R environment. A commercially supported, enterprise version of R is also available from Revolution Analytics.

More specifically, R is an integrated suite of software tools and technologies designed to create custom applications used to facilitate data manipulation, calculation, analysis, and visual display. Among other advanced capabilities, it supports

Effective data-handling and manipulation components.
Operators for calculations on arrays and other types of ordered data.
Tools specific to a wide variety of data analyses.
Advanced visualization capabilities.
S programming language designed by programmers, for programmers with many familiar constructs, including conditionals, loops, user-defined recursive functions, and a broad range of input and output facilities.

R is well suited to single-use, custom applications for analysis of big data sources.

Google Prediction API

The Google Prediction API is an example of an emerging class of big data analysis application tools. It is available on the Google developers website and is well documented and provided with several mechanisms for access using different programming languages. To help you get started, it is freely available for six months.

The Prediction API is fairly simple. It looks for patterns and matches them to proscriptive, prescriptive, or other existing patterns. While performing its pattern matching, it also “learns.” The more you use it, the smarter it gets.

Prediction is implemented as a RESTful API with language support for .NET, Java, PHP, JavaScript, Python, Ruby, and many others. Google also provides scripts for accessing the API as well as a client library for R.

Predictive analysis is one of the most powerful potential capabilities of big data, and the Google Prediction API is a very useful tool for creating custom applications.

Semi-custom applications for big data analysis

In truth, what many people perceive as custom applications are actually created using “packaged” or third-party components like libraries. It is not always necessary to completely code a new application. Using packaged applications or components requires developers or analysts to write code to “knit together” these components into a working custom application. The following are reasons why this is a sound approach:

Speed to deployment: Because you don’t have to write every part of the application, the development time can be greatly reduced.
Stability: Using well-constructed, reliable, third-party components can help to make the custom application more resilient.
Better quality: Packaged components are often subject to higher quality standards because they are deployed into a wide variety of environments and domains.
More flexibility: If a better component comes along, it can be swapped into the application, extending the lifetime, adaptability, and usefulness of the custom application.

Another type of semi-custom application is one where the source code is available and is modified for a particular purpose. This can be an efficient approach because there are quite a few examples of application building blocks available to incorporate into your semi-custom application:

TA-Lib: The Technical Analysis library is used extensively by software developers who need to perform technical analysis of financial market data. It is available as open source under the BSD license, allowing it to be integrated into semi-custom applications.
JUNG: The Java Universal Network Graph framework is a library that provides a common framework for analysis and visualization of data that can be represented by a graph or network. It is useful for social network analysis, importance measures, and data mining. It is available as open source under the BSD license.
GeoTools: An open source geospatial toolkit for manipulating GIS data in many forms, analyzing spatial and non-spatial attributes or GIS data, and creating graphs and networks of the data. It is available under the GPL2 license, allowing for integration into semi-custom applications.