Articles & Books From Big Data

Article / Updated 12-01-2023
Getting the most out of your unstructured data is an essential task for any organization these days, especially when considering the disparate storage systems, applications, and user locations. So, it’s not an accident that data orchestration is the term that brings everything together.Bringing all your data together shares similarities with conducting an orchestra.
Cheat Sheet / Updated 03-10-2022
Summary statistical measures represent the key properties of a sample or population as a single numerical value. This has the advantage of providing important information in a very compact form. It also simplifies comparing multiple samples or populations. Summary statistical measures can be divided into three types: measures of central tendency, measures of central dispersion, and measures of association.
Cheat Sheet / Updated 02-09-2022
To stay competitive today, companies must find practical ways to deal with big data — that is, to learn new ways to capture and analyze growing amounts of information about customers, products, and services.Data is becoming increasingly complex in structured and unstructured ways. New sources of data come from machines, such as sensors; social business sites; and website interaction, such as click-stream data.
Cheat Sheet / Updated 04-12-2022
Big data makes big headlines, but it’s much more than just a buzz phrase or the latest business fad. The phenomenon is very real and it’s producing concrete benefits in so many different areas – particularly in business. Here you will get to the heart of big data as a business owner or manager: You will take a look at the key terminology you need to understand the crucial big data skills for businesses, ten steps to using big data to make better decisions, and tips for communicating insights from data to your colleagues.
Article / Updated 03-26-2016
Many different techniques have been designed to forecast the future value of a variable. Two of these are time series regression models and simulation models. Time series regression models A time series regression model is used to estimate the trend followed by a variable over time, using regression techniques.
Article / Updated 03-26-2016
For a dataset that consists of observations taken at different points in time (that is, time series data), it's important to determine whether or not the observations are correlated with each other. This is because many techniques for modeling time series data are based on the assumption that the data is uncorrelated with each other (independent).
Article / Updated 03-26-2016
Probability distributions is one of many statistical techniques that can be used to analyze data to find useful patterns. You use a probability distribution to compute the probabilities associated with the elements of a dataset: Binomial distribution: You would use the binomial distribution to analyze variables that can assume only one of two values.
Article / Updated 03-26-2016
A statistic is said to be robust if it isn’t strongly influenced by the presence of outliers. For example, the mean is not robust because it can be strongly affected by the presence of outliers. On the other hand, the median is robust — it isn’t affected by outliers. For example, suppose the following data represents a sample of household incomes in a small town (measured in thousands of dollars per year): 32, 47, 20, 25, 56 You compute the sample mean as the sum of the five observations divided by five: The sample mean is $36,000 per year.
Article / Updated 03-26-2016
A stem-and-leaf plot is a graphical device in which the distribution of a dataset is organized by the numerical value of the observations in the dataset. The diagram consists of a "stem," showing the different categories in the data, and a "leaf," which shows the values of the individual observations in the dataset.
Article / Updated 03-26-2016
Healthcare is one area where big data has the potential to make dramatic improvements in the quality of life. The increasing availability of massive amounts of data and rapidly increasing computer power could enable researchers to make breakthroughs, such as the following: Predicting outbreaks of diseases Gaining a better understanding of the effectiveness and side effects of drugs Developing customized treatments based on patient histories Reducing the cost of developing new treatments One of the biggest challenges facing the use of big data in healthcare is that much of the data is stored in independent "silos.