What Are the Key Properties of a Dataset?

By Alan Anderson, David Semmelroth

Prior to performing any type of statistical analysis, understanding the nature of the data being analyzed is essential. You can use EDA to identify the properties of a dataset to determine the most appropriate statistical methods to apply to the data. You can investigate several types of properties with EDA techniques, including the following:

  • The center of the data

  • The spread among the members of the data

  • The skewness of the data

  • The probability distribution the data follows

  • The correlation among the elements in the dataset

  • Whether or not the parameters of the data are constant over time

  • The presence of outliers in the data

Another key question EDA answers is “Does the data conform to our assumptions?” Identifying the properties of a dataset is very important, because many statistical procedures are sensitive to the assumptions you make about the data.