Environmental Data Science — Natural Resources

By Lillian Pierson

You can use data science to model natural resources in their raw form. This type of environmental data science generally involves some advanced statistical modeling to better understand natural resources. You model the resources in the raw — water, air, and land conditions as they occur in nature — to better understand the natural environment’s organic effects on human life.

Exploring natural resource modeling

Environmental data science can model natural resources in the raw so that you can better understand environmental processes in order to comprehend how those processes affect life on Earth. After environmental processes are clearly understood, then and only then can environmental engineers step in to design systems to solve problems that these natural processes may be creating. The following list describes the types of natural-resource issues that environmental data science can model and predict:

  • Water issues: Rainfall rates, geohydrologic patterns, groundwater flows, and groundwater toxin concentrations
  • Air issues: The concentration and dispersion of particulate-matter levels and greenhouse gas concentrations
  • Land issues: Soil contaminant migration and geomorphology as well as geophysics, mineral exploration, and oil and gas exploration

If your goal is to build a predictive model that you can use to help you better understand natural environmental processes, you can use natural resource modeling to help you. Don’t expect natural-resource modeling to be easy, though. The statistics that go into these types of models can be incredibly complex.

Dabbling in data science

Because environmental processes and systems involve many different interdependent variables, most natural-resource modeling requires the use of incredibly complex statistical algorithms. The following list shows a few elements of data science that are commonly deployed in natural-resource modeling:

  • Statistics, math, and machine learning: Bayesian inference, multilevel hierarchical Bayesian inference, multitaper spectral analysis, copulas, Wavelet Autoregressive Method (WARM), Autoregressive Moving Averages (ARMAs), Monte Carlo simulations, structured additive regression (STAR) models, regression on order statistics (ROS), maximum likelihood estimations (MLEs), expectation-maximization (EM), linear and nonlinear dimension reduction, wavelets analysis, frequency domain methods, Markov chains, k-nearest neighbor (kNN), kernel density, and logspline density estimation, among other methods
  • Spatial statistics: Generally, something like probabilistic mapping
  • Data visualization: As in other data science areas, needed for exploratory analysis and for communicating findings with others
  • Web-scraping: Many times, required when gathering data for environmental models
  • GIS technology: Spatial analysis and mapmaking
  • Coding requirements: Using Python, R, SPSS, SAS, MATLAB, Fortran, and SQL, among other programming languages

Modeling natural resources to solve environmental problems

The work of Columbia Water Center‘s director, Dr. Upmanu Lall, provides a world-class example of using environmental data science to solve incredibly complex water resource problems. Dr. Lall uses advanced statistics, math, coding, and a staggering subject-matter expertise in environmental engineering to uncover complex, interdependent relationships between global water-resource characteristics, national gross domestic products (GDPs), poverty, and national energy consumption rates.

In one of Dr. Lall’s recent projects, he found that in countries with high rainfall variability — countries that experience extreme droughts followed by massive flooding — the instability results in a lack of stable water resources for agricultural development, more runoff and erosion, and overall decreases in that nation’s GDP. The inverse is also true, where countries that have stable, moderate rainfall rates have a better water resource supply for agricultural development, better environmental conditions overall, and higher average GDPs. So, using environmental data science, Dr. Lall has been able to draw strong correlations between a nation’s rainfall trends and its poverty rates.

With respect to data science technologies and methodologies, Dr. Lall implements these tools:

  • Statistical programming: Dr. Lall’s arsenal includes multilevel hierarchical Bayesian models, multitaper spectral analysis, copulas, Wavelet Autoregressive Moving Averages (WARMs), Autoregressive Moving Averages (ARMAs), and Monte Carlo simulations.
  • Mathematical programming: Tools here include linear and nonlinear dimension reduction, wavelets analysis, frequency domain methods, and nonhomogeneous hidden Markov models.
  • Clustering analysis: In this case, Dr. Lall relies on the tried-and-true methods, including k-nearest neighbor, kernel density, and logspline density estimation.
  • Machine learning: Here, Dr. Lall focuses on minimum variance embedding.