Articles & Books From Big Data

Article / Updated 12-01-2023
Getting the most out of your unstructured data is an essential task for any organization these days, especially when considering the disparate storage systems, applications, and user locations. So, it’s not an accident that data orchestration is the term that brings everything together.Bringing all your data together shares similarities with conducting an orchestra.
Cheat Sheet / Updated 03-10-2022
Summary statistical measures represent the key properties of a sample or population as a single numerical value. This has the advantage of providing important information in a very compact form. It also simplifies comparing multiple samples or populations. Summary statistical measures can be divided into three types: measures of central tendency, measures of central dispersion, and measures of association.
Cheat Sheet / Updated 02-09-2022
To stay competitive today, companies must find practical ways to deal with big data — that is, to learn new ways to capture and analyze growing amounts of information about customers, products, and services.Data is becoming increasingly complex in structured and unstructured ways. New sources of data come from machines, such as sensors; social business sites; and website interaction, such as click-stream data.
Cheat Sheet / Updated 04-12-2022
Big data makes big headlines, but it’s much more than just a buzz phrase or the latest business fad. The phenomenon is very real and it’s producing concrete benefits in so many different areas – particularly in business. Here you will get to the heart of big data as a business owner or manager: You will take a look at the key terminology you need to understand the crucial big data skills for businesses, ten steps to using big data to make better decisions, and tips for communicating insights from data to your colleagues.
Article / Updated 03-26-2016
Many different techniques have been designed to forecast the future value of a variable. Two of these are time series regression models and simulation models. Time series regression models A time series regression model is used to estimate the trend followed by a variable over time, using regression techniques.
Article / Updated 03-26-2016
Existing analytics tools and techniques will be very helpful in making sense of big data. The algorithms that are part of these tools, however, must be able to work with large amounts of potentially real-time and disparate data. A competent infrastructure must be in place to support this. And, vendors providing analytics tools will also need to ensure that their algorithms work across distributed implementations.
Article / Updated 03-26-2016
Before you apply statistical techniques to a dataset, it's important to examine the data to understand its basic properties. You can use a series of techniques that are collectively known as Exploratory Data Analysis (EDA) to analyze a dataset. EDA helps ensure that you choose the correct statistical techniques to analyze and forecast the data.
Article / Updated 03-26-2016
For a dataset that consists of observations taken at different points in time (that is, time series data), it's important to determine whether or not the observations are correlated with each other. This is because many techniques for modeling time series data are based on the assumption that the data is uncorrelated with each other (independent).
Article / Updated 03-26-2016
Probability distributions is one of many statistical techniques that can be used to analyze data to find useful patterns. You use a probability distribution to compute the probabilities associated with the elements of a dataset: Binomial distribution: You would use the binomial distribution to analyze variables that can assume only one of two values.
Article / Updated 03-26-2016
A statistic is said to be robust if it isn’t strongly influenced by the presence of outliers. For example, the mean is not robust because it can be strongly affected by the presence of outliers. On the other hand, the median is robust — it isn’t affected by outliers. For example, suppose the following data represents a sample of household incomes in a small town (measured in thousands of dollars per year): 32, 47, 20, 25, 56 You compute the sample mean as the sum of the five observations divided by five: The sample mean is $36,000 per year.