Deploying Analytics and Data Wrangling to Convert Raw Data into Actionable Insights
Turning your raw data into actionable insights is the first step in the progression from the data you’ve collected to something that actually benefits you. Business-centric data scientists use data analytics to generate insights from raw data.
Identifying the types of analytics
Listed below, in order of increasing complexity, are the four types of data analytics you’ll most likely encounter:
Descriptive analytics: This type of analytics answers the question, “What happened?” Descriptive analytics are based on historical and current data. A business analyst or a business-centric data scientist bases modern-day business intelligence on descriptive analytics.
Diagnostic analytics: You use this type of analytics to find answers to the question, “why did this particular something happen?” or “what went wrong?” Diagnostic analytics are useful for deducing and inferring the success or failure of sub-components of any data-driven initiative.
Predictive analytics: Although this type of analytics is based on historical and current data, predictive analytics go one step further than descriptive analytics. Predictive analytics involve complex model-building and analysis in order to predict a future event or trend. In a business context, these analyses would be performed by the business-centric data scientist.
Prescriptive analytics: This type of analytics aims to optimize processes, structures, and systems through informed action that’s based on predictive analytics — essentially telling you what you should do based on an informed estimation of what will happen. Both business analysts and business-centric data scientists can generate prescriptive analytics, but their methods and data sources differ.
Ideally, a business should engage in all four types of data analytics, but prescriptive analytics is the most direct and effective means by which to generate value from data insights.
Identifying common challenges in analytics
Analytics commonly pose at least two challenges in the business enterprise. First, organizations often have a very hard time finding new hires with specific skill sets that include analytics. Second, even skilled analysts often have difficulty communicating complex insights in a way that’s understandable to management decision makers.
To overcome these challenges, the organization must create and nurture a culture that values and accepts analytics products. The business must work to educate all levels of the organization, so that management has a basic concept of analytics and the success that can be achieved by implementing them.
Conversely, business-centric data scientists must have a very solid working knowledge about business in general and, in particular, a solid understanding of the business at hand. A strong business knowledge is one of the three main requirements of any business-centric data scientist — the other two being a strong coding acumen and strong quantitative analysis skills via math and statistical modeling.
Wrangling raw data to actionable insights
Data wrangling is another important portion of the work that’s required to convert data to insights. To build analytics from raw data, you’ll almost always need to use data wrangling — the processes and procedures that you use to clean and convert data from one format and structure to another so that the data is accurate and in the format analytics tools and scripts require for consumption.
The following list highlights a few of the practices and issues most relevant to data wrangling:
Data extraction: The business-centric data scientist must first identify what datasets are relevant to the problem at hand, and then extract sufficient quantities of the data that’s required to solve the problem. (This extraction process is commonly referred to as data mining.)
Data munging: Data munging involves cleaning the raw data extracted through data mining, then converting it into a format that allows for a more convenient consumption of the data. (Mung began life as a destructive process, where you would convert something recognizable into something that was unrecognizable, thus the phrase Mash Until No Good, or MUNG.)
Data governance: Data governance standards are standards that are used as a quality control measure to ensure that manual and automated data sources conform to the data standards of the model at hand. Data governance standards must be applied so that the data is at the right granularity when it’s stored and made ready for use.
Granularity is a measure of a dataset’s level of detail. Data granularity is determined by the relative size of the sub-groupings into which the data is divided.
Data architecture: IT architecture is key. If your data is isolated in separate, fixed repositories — those infamous data silos everybody complains about — then it’s available to only a few people within a particular line of business. Siloed data structures result in scenarios where a majority of an organization’s data is simply unavailable for use by the organization at large. (Needless to say, siloed data structures are incredibly wasteful and inefficient.)
If your goal is to derive the most value and insight from your organization’s business data, then you should ensure that the data is stored in a central data warehouse and not in separate silos.