Using Spatial Statistics to Predict for Environmental Variation across Space

By Lillian Pierson

By their very nature, environmental variables are location-dependent: They change with changes in geospatial location. The purpose of modeling environmental variables with spatial statistics is to enable accurate spatial predictions so that you can use those predictions to solve problems related to the environment.

Spatial statistics is distinguished from natural-resource modeling because it focuses on predicting how changes in space affect environmental phenomenon. Naturally, the time variable is considered as well, but spatial statistics is all about using statistics to model the inner workings of spatial phenomenon. The difference is in the manner of approach.

Addressing environmental issues with spatial predictive analytics

You can use spatial statistics to model environmental variables across space and time so that you can predict changes in environmental variables across space. The following list describes the types of environmental issues that you can model and predict using spatial statistical modeling:

  • Epidemiology and environmental human health: Disease patterns and distributions
  • Meteorology: Weather phenomenon
  • Fire science: The spread of a fire (by channeling your inner Smokey the Bear!)
  • Hydraulics: Aquifer conductivity
  • Ecology: Microorganism distribution across a sedimentary lake bottom

If your goal is to build a model that you can use to predict how change in space will affect environmental variables, you can use spatial statistics to help you do this.

Describing the data science that’s involved

Because spatial statistics involves modeling the x-, y-, z-parameters that comprise spatial datasets, the statistics involved can get rather interesting and unusual. Spatial statistics is, more or less, a marriage of GIS spatial analysis and advanced predictive analytics. The following list describes a few data science processes that are commonly deployed when using statistics to build predictive spatial models:

  • Spatial statistics: Spatial statistics often involves krige and kriging, as well as variogram analysis. The terms “kriging” and “krige” denote different things. Kriging methods are a set of statistical estimation algorithms that curve-fit known point data and produce a predictive surface for an entire study area. Krige represents an automatic implementation of kriging algorithms, where you use simple default parameters to help you generate predictive surfaces. A variogram is a statistical tool that measures how different spatial data becomes as the distance between data points increases. The variogram is a measure of “spatial dissimilarity”. When you krige, you use variogram models with internally defined parameters to generate interpolative, predictive surfaces.
  • Statistical programming: This one involves probability distributions, time series analyses, regression analyses, and Monte Carlo simulations, among other processes.
  • Clustering analysis: Processes can include nearest-neighbor algorithms, k-means clustering, or kernel density estimations.
  • GIS technology: GIS technology pops up a lot in this chapter, but that’s to be expected because its spatial analysis and map-making offerings are incredibly flexible.
  • Coding requirements: Programming for a spatial statistics project could entail using R, SPSS, SAS, MATLAB, and SQL, among other programming languages.

Addressing environmental issues with spatial statistics

A great example of using spatial statistics to generate predictions for location-dependent environmental variables can be seen in the recent work of Dr. Pierre Goovaerts. Dr. Goovaerts uses advanced statistics, coding, and his authoritative subject-matter expertise in agricultural engineering, soil science, and epidemiology to uncover correlations between spatial disease patterns, mortality, environmental toxin exposure, and sociodemographics.

In one of Dr. Goovaerts recent projects, he used spatial statistics to model and analyze data on groundwater arsenic concentrations, location, geologic properties, weather patterns, topography, and land cover. Through his recent environmental data science studies, he discovered that the incidence of bladder, breast, and prostate cancers is spatially correlated to long-term arsenic exposure.

With respect to data science technologies and methodologies, Dr. Goovaerts commonly implements the following:

  • Spatial statistical programming: Once again, kriging and variogram analysis top the list.
  • Statistical programming: Least squares regression and Monte Carlo (a random simulation method) are central to Dr. Goovaerts’s work.
  • GIS technologies: If you want map-making functionality and spatial data analysis methodologies, you’re going to need GIS technologies.