The VARIOGRAM Procedure

Theoretical and Computational Details of the Semivariogram

Let $\{ Z({\bm {s}}), \bm {s} \in D \subset \mathcal{R}^2 \} $ be a spatial random field (SRF) with n measured values $z_ i=Z(\bm {s}_ i)$ at respective locations $\bm {s}_ i$, $i=1,\dots ,n$. You use the VARIOGRAM procedure because you want to gain insight into the spatial continuity and structure of $Z(\bm {s})$. A good measure of the spatial continuity of $Z(\bm {s})$ is defined by means of the variance of the difference $Z(\bm {s}_ i)-Z(\bm {s}_ j)$, where $\bm {s}_ i$ and $\bm {s}_ j$ are locations in D. Specifically, if you consider $\bm {s}_ i$ and $\bm {s}_ j$ to be spatial increments such that $\bm {h}=\bm {s}_ j-\bm {s}_ i$, then the variance function based on the increments $\bm {h}$ is independent of the actual locations $\bm {s}_ i$, $\bm {s}_ j$. Most commonly, the continuity measure used in practice is one half of this variance, better known as the semivariance function,

\[  \gamma _ z(\bm {h}) = \frac{1}{2}{\mr {Var}}[Z(\bm {s}+\bm {h})-Z(\bm {s})]  \]

or, equivalently,

\[  \gamma _ z(\bm {h}) = \frac{1}{2} \left( \mr {E}\{ [Z(\bm {s}+\bm {h})-Z(\bm {s})]^2\}  - \{ \mr {E}[Z(\bm {s}+\bm {h})]-\mr {E}[Z(\bm {s})]\} ^2 \right)  \]

The plot of semivariance as a function of $\bm {h}$ is the semivariogram. You might also commonly see the term semivariogram used instead of the term semivariance.

Assume that the SRF $Z(\bm {s})$ is free of nonrandom (or systematic) surface trends. Then, the expected value $\mr {E}[Z(\bm {s})]$ of $Z(\bm {s})$ is a constant for all $\bm {s} \in \mathcal{R}^2$, and the semivariance expression is simplified to the following:

\[  \gamma _ z(\bm {h}) = \frac{1}{2} \mr {E}\{ [Z(\bm {s}+\bm {h})-Z(\bm {s})]^2\}   \]

Given the preceding assumption, you can compute an estimate $\hat{\gamma }_ z(\bm {h})$ of the semivariance $\gamma _ z(\bm {h})$ from a finite set of points in a practical way by using the formula

\[  \hat{\gamma }_ z(\bm {h}) = \frac{1}{2 \mid N(\bm {h}) \mid }\sum _{N(\bm {h})}[Z(\bm {s}_ i)-Z(\bm {s}_ j)]^2  \]

where the sets $N(\bm {h})$ contain all the neighboring pairs at distance $\bm {h}$,

\[  N(\bm {h}) = \{ i,j: \bm {s}_ i-\bm {s}_ j = \bm {h}\}   \]

and $\mid N(\bm {h}) \mid $ is the number of such pairs $(i,j)$.

The expression for $\hat{\gamma }_ z(\bm {h})$ is called the empirical semivariance (Matheron, 1963). This is the quantity that PROC VARIOGRAM computes, and its corresponding plot is the empirical semivariogram.

The empirical semivariance $\hat{\gamma }_ z(\bm {h})$ is also referred to as classical. This name is used so that it can be distinguished from the robust semivariance estimate $\bar{\gamma }_ z(\bm {h})$ and the corresponding robust semivariogram. The robust semivariance was introduced by Cressie and Hawkins (1980) to weaken the effect that outliers in the observations might have on the semivariance. It is described by Cressie (1993, p. 75) as

\[  \bar{\gamma }_ z(\bm {h}) = \frac{\Psi ^4(\bm {h})}{2 [0.457 + 0.494/N(\bm {h})]}  \]

In the preceding expression the parameter $\Psi (\bm {h})$ is defined as

\[  \Psi (\bm {h}) = \frac{1}{N(\bm {h})} \sum _{P_ iP_ j \in N(\bm {h})}[Z(\bm {s}_ i)-Z(\bm {s}_ j)]^{\frac{1}{2}}  \]

According to Cressie (1985), the estimate $\hat{\gamma }_ z(\bm {h})$ has approximate variance

\[  {\mr {Var}}[\hat{\gamma }_ z(\bm {h})] \simeq \frac{2[\gamma _ z(\bm {h})]^2}{N(\bm {h})}  \]

This approximation is possible by assuming $Z(\bm {s})$ to be a Gaussian SRF, and by further assuming the squared differences in empirical semivariances to be uncorrelated for different distances $\bm {h}$. Typically, semivariance estimates are correlated because of the underlying spatial correlation among the observations, and also because the same observation pairs might be used for the estimation of more than one semivariogram point, as described in the following subsections. Despite these restrictive assumptions, the approximate variance provides an idea about the semivariance estimate variance and enables fitting of a theoretical model to the empirical semivariance; see the section Theoretical Semivariogram Model Fitting for more details about the fitting process.

Note: If your data include a surface trend, then the empirical semivariance $\hat{\gamma }_ z(\bm {h})$ is not an estimate of the theoretical semivariance function $\gamma _ z(\bm {h})$. Instead, rather than the spatial increments variance, it represents a different quantity known as pseudo-semivariance, and its corresponding plot is a pseudo-semivariogram. In principle, pseudo-semivariograms do not provide measures of the spatial continuity. They can thus lead to misinterpretations of the $Z(\bm {s})$ spatial structure, and are consequently unsuitable for the purpose of spatial prediction. For further information, see the detailed discussion in the section Empirical Semivariograms and Surface Trends. Under certain conditions you might be able to gain some insight about the spatial continuity with a pseudo-semivariogram. This case is presented in Analysis without Surface Trend Removal.