Geostatistics
Geostatistics is a branch of statistics focusing on spatial or spatiotemporal datasets. Developed originally to predict probability distributions of ore grades for mining operations, it is currently applied in diverse disciplines including petroleum geology, hydrogeology, hydrology, meteorology, oceanography, geochemistry, geometallurgy, geography, forestry, environmental control, landscape ecology, soil science, and agriculture. Geostatistics is applied in varied branches of geography, particularly those involving the spread of diseases, the practice of commerce and military planning, and the development of efficient spatial networks. Geostatistical algorithms are incorporated in many places, including geographic information systems and the R statistical environment.
Background
Geostatistics is intimately related to interpolation methods, but extends far beyond simple interpolation problems. Geostatistical techniques rely on statistical models that are based on random function theory to model the uncertainty associated with spatial estimation and simulation.A number of simpler interpolation methods/algorithms, such as inverse distance weighting, bilinear interpolation and nearest-neighbor interpolation, were already well known before geostatistics. Geostatistics goes beyond the interpolation problem by considering the studied phenomenon at unknown locations as a set of correlated random variables.
Let be the value of the variable of interest at a certain location. This value is unknown. Although there exists a value at location that could be measured, geostatistics considers this value as random since it was not measured, or has not been measured yet. However, the randomness of is not complete, but defined by a cumulative distribution function that depends on certain information that is known about the value :
Typically, if the value of is known at locations close to one can constrain the CDF of by this neighborhood: if a high spatial continuity is assumed, can only have values similar to the ones found in the neighborhood. Conversely, in the absence of spatial continuity can take any value. The spatial continuity of the random variables is described by a model of spatial continuity that can be either a parametric function in the case of variogram-based geostatistics, or have a non-parametric form when using other methods such as multiple-point simulation or pseudo-genetic techniques.
By applying a single spatial model on an entire domain, one makes the assumption that is a stationary process. It means that the same statistical properties are applicable on the entire domain. Several geostatistical methods provide ways of relaxing this stationarity assumption.
In this framework, one can distinguish two modeling goals:
- Estimating the value for, typically by the expectation, the median or the mode of the CDF. This is usually denoted as an estimation problem.
- Sampling from the entire probability density function by actually considering each possible outcome of it at each location. This is generally done by creating several alternative maps of, called realizations. Consider a domain discretized in grid nodes. Each realization is a sample of the complete -dimensional joint distribution function
Methods
Estimation
Kriging
Kriging is a group of geostatistical techniques to interpolate the value of a random field at an unobserved location from observations of its value at nearby locations.Simulation
- Aggregation
- Dissagregation
- Turning bands
- Cholesky decomposition
- Truncated Gaussian
- Plurigaussian
- Annealing
- Spectral simulation
- Sequential Indicator
- Sequential Gaussian
- Dead Leave
- Transition probabilities
- Markov chain geostatistics
- Support vector machine
- Boolean simulation
- Genetic models
- Pseudo-genetic models
- Cellular automata
- Multiple-Point Geostatistics
Definitions and tools
- Regionalized variable theory
- Covariance function
- Semi-variance
- Variogram
- Kriging
- Range
- Sill
- Nugget effect
- Training image
Main scientific journals related to geostatistics
- Mathematical Geosciences
- Stochastic Environmental Research and Risk Assessment
Scientific organisations related to geostatistics
- promotes the use of geostatistical methods in environmental applications
- International Association for Mathematical Geosciences
Related software
- A classical open-source package dedicated to geostatistics, source code in Fortran 77 and 90.
- A python module built with codified GSLIB source code wrapped into python and Cython functions for drillhole processing, block modeling, computational geometry, VTK support and non-linear geostatistics
- An open-source package dedicated to geostatistics with user-friendly interface, source code in C++ with the GsTL a dedicated geostatistics C++ template library.
- A complete proprietary solution for geostatistics and resource estimation.
- A next-generation proprietary solution for geostatistics and resource estimation programmable with python code.
- A complete proprietary solution for geostatistics, geological modelling and resource estimation, see SGS Genesis.
- The R programming language has around 20 other dedicated to geostatistics, and around 30 dedicated to other areas of spatial statistics.
- is a software based on the MATLAB language able to handle spatiotemporal univariate and multivariate datasets. The software allows users to produce dynamic maps of the observed variables over geographic regions.
- High-performance implementations of geostatistical algorithms for the Julia programming language.
- GSLIB and other data analytics and geostatistical functionality for spatial modeling in an open-source Python package.