Empirical Variograms

The empirical variogram is a visual tool for quantifying spatial covariance. It uses semivariance ($\gamma$), which is a measure of covariance between points or units ($i$ and $j$) as a function of distance ($h$):

$$\gamma(h) = \frac{1}{2|N(h)|}\sum_{N(h)}(x_i - x_j)^2$$

Semivariances are binned for distance intervals. The average values for semivariance and distance interval can be fit to mathematical models designed to explain how semivariance changes over distance.

Three important concepts of an empirical variogram are nugget, sill and range

range = distance up to which is there is spatial correlation
sill = uncorrelated variance of the variable of interest
nugget = measurement error, or short-distance spatial variance and other unaccounted for variance

2 other concepts:

partial sill = sill - nugget
nugget effect = the nugget/sill ratio, interpreted opposite of $r^2$ (the closer it is to 1, the less the amount of spatial autocorrelation)

Correlated Error Models

Many equations exist for modelling semivariance patterns. A deep knowledge of these is not required to fit an empirical variogram to a model. Here are a few popular examples.

Exponential

$$ \gamma (h) = \begin{cases}0 & \text{if }h=0 \\
C_0+C_1 \left [ 1-e^{-(\frac{h}{r}) } \right] & \text{if } h>0 \end{cases}$$

where

$$ C_0 = nugget $$ $$ C_1 = partial : sill $$ $$ r = range $$

Gaussian

(a squared version of the exponential model)

$$ \gamma (h) = \begin{cases}0 & \text{if }h=0, \\
C_0+C_1 \left [ 1-e^{-(\frac{h}{r})^2} \right] & \text{if } h>0 \end{cases}$$

where

$$ C_0 = nugget $$ $$ C_1 = partial : sill $$ $$ r = range $$

Matérn

</An extremely complicated mathematical model/>

There are many more models: Cauchy, logistic, spherical, sine, ….

For more information on these models, see this workshop’s accompanying online book on this topic and additional SAS resources.

Variogram fitting

Picking the right model is done both by comparing the sum of squares of error for different models and by

Not all variables have spatial autocorrelation

Not all fitted variogram models are worthy

Code for this section

The following scripts build upon work done in previous section(s).

# load libraries
library(gstat); library(spaMM)

# set up spatial object
Nin_spatial <- Nin_na
coordinates(Nin_spatial) <- ~ col.width + row.length # add attribte
class(Nin_spatial)

# establish max distance for variogram estimation
max_dist = 0.6*max(dist(coordinates(Nin_spatial)))

# calculate empirical variogram
resid_var1 <- gstat::variogram(yield ~ rep + gen, 
                        cutoff = max_dist,
                        width = max_dist/15, # 15 is the number of bins
                        data = Nin_spatial)
plot(resid_var1)  # empirical variogram

#Note: To fit a large number of models, the function 'autofitVariogram()' from the package automap can be used (is it calling gstat::variogram)

# starting value for the nugget
nugget_start <- min(resid_var1$gamma) 

# initialise the model (this does not do much)
Nin_vgm_exp <- vgm(model = "Exp", nugget = nugget_start) # exponential
Nin_vgm_gau <- vgm(model = "Gau", nugget = nugget_start) # Gaussian
Nin_vgm_mat <- vgm(model = "Mat", nugget = nugget_start) # Matern

# actually do some fitting! 
Nin_variofit_exp <- fit.variogram(resid_var1, Nin_vgm_exp)
Nin_variofit_gau <- fit.variogram(resid_var1, Nin_vgm_gau)
Nin_variofit_mat <- fit.variogram(resid_var1, Nin_vgm_mat, fit.kappa = T)

plot(resid_var1, Nin_variofit_exp, main = "Exponential model")
plot(resid_var1, Nin_variofit_gau, main = "Gaussian model")
plot(resid_var1, Nin_variofit_mat, main = "Matern model") 

attr(Nin_variofit_exp, "SSErr")
attr(Nin_variofit_gau, "SSErr")
attr(Nin_variofit_mat, "SSErr")

# parameters:
Nin_variofit_gau

nugget <- Nin_variofit_gau$psill[1] # measurement error (other random error)
range <- Nin_variofit_gau$range[2] # distance to establish independence between data points
sill <- sum(Nin_variofit_gau$psill) # maximum semivariance

SAS

# calculate semivariance and compute empirical variogram
proc variogram data=residuals plots(only)=(semivar);
   coordinates xc=Col yc=Row;
   compute lagd=1.2 maxlags=30;
  var resid;
run;

# fit models to the empirical variogram
proc variogram data=residuals plots(only)=(fitplot);
   coordinates xc=Col yc=Row;
   compute lagd=1.2 maxlags=30;
   model form=auto(mlist=(gau, exp, pow, sph) nest=1);
  var resid;
run;

Last updated on Oct 26, 2022