David Semmelroth

David Semmelroth has two decades of experience translating customer data into actionable insights across the financial services, travel, and entertainment industries. David has consulted for Cedar Fair, Wachovia, National City, and TD Bank.

Articles & Books From David Semmelroth

Cheat Sheet / Updated 03-10-2022
Summary statistical measures represent the key properties of a sample or population as a single numerical value. This has the advantage of providing important information in a very compact form. It also simplifies comparing multiple samples or populations. Summary statistical measures can be divided into three types: measures of central tendency, measures of central dispersion, and measures of association.
Article / Updated 03-26-2016
Many different techniques have been designed to forecast the future value of a variable. Two of these are time series regression models and simulation models. Time series regression models A time series regression model is used to estimate the trend followed by a variable over time, using regression techniques.
Article / Updated 03-26-2016
For a dataset that consists of observations taken at different points in time (that is, time series data), it's important to determine whether or not the observations are correlated with each other. This is because many techniques for modeling time series data are based on the assumption that the data is uncorrelated with each other (independent).
Article / Updated 03-26-2016
Before you apply statistical techniques to a dataset, it's important to examine the data to understand its basic properties. You can use a series of techniques that are collectively known as Exploratory Data Analysis (EDA) to analyze a dataset. EDA helps ensure that you choose the correct statistical techniques to analyze and forecast the data.
Article / Updated 03-26-2016
A statistic is said to be robust if it isn’t strongly influenced by the presence of outliers. For example, the mean is not robust because it can be strongly affected by the presence of outliers. On the other hand, the median is robust — it isn’t affected by outliers. For example, suppose the following data represents a sample of household incomes in a small town (measured in thousands of dollars per year): 32, 47, 20, 25, 56 You compute the sample mean as the sum of the five observations divided by five: The sample mean is $36,000 per year.
Article / Updated 03-26-2016
Probability distributions is one of many statistical techniques that can be used to analyze data to find useful patterns. You use a probability distribution to compute the probabilities associated with the elements of a dataset: Binomial distribution: You would use the binomial distribution to analyze variables that can assume only one of two values.
Article / Updated 03-26-2016
A stem-and-leaf plot is a graphical device in which the distribution of a dataset is organized by the numerical value of the observations in the dataset. The diagram consists of a "stem," showing the different categories in the data, and a "leaf," which shows the values of the individual observations in the dataset.
Article / Updated 03-26-2016
Healthcare is one area where big data has the potential to make dramatic improvements in the quality of life. The increasing availability of massive amounts of data and rapidly increasing computer power could enable researchers to make breakthroughs, such as the following: Predicting outbreaks of diseases Gaining a better understanding of the effectiveness and side effects of drugs Developing customized treatments based on patient histories Reducing the cost of developing new treatments One of the biggest challenges facing the use of big data in healthcare is that much of the data is stored in independent "silos.
Article / Updated 03-26-2016
Statistical software packages are extremely powerful these days, but they cannot overcome poor quality data. Following is a checklist of things you need to do before you go off building statistical models. Check data formats Your analysis always starts with a raw data file. Raw data files come in many different shapes and sizes.
Article / Updated 03-26-2016
Most datasets come with some sort of metadata, which is essentially a description of the data in the file. Metadata typically includes descriptions of the formats, some indication of what values are in each data field, and what these values mean. When you are faced with a new dataset, never take the metadata at face value.