The VARIOGRAM Procedure

Autocorrelation Weights

In general, the choice of a weighting scheme is subjective. You can obtain different results by using different schemes, options, and parameters. PROC VARIOGRAM offers you considerable flexibility in choosing weights that are appropriate for prior considerations such as different hypotheses about neighboring areas, definition of the neighborhood structure, and accounting for natural barriers or other spatial characteristics; see the discussion in Cliff and Ord (1981, p. 17). As stressed for all types of spatial analysis, it is important to have good knowledge of your data. In the autocorrelation statistics, this knowledge can help you avoid spurious correlations when you choose the weights.

The starting point is to assign individual weights to each one of the n data values $z_ i$, $i=1,\dots ,n$, with respect to the rest. An $n \times n$ matrix of weights is thus defined, such that for any two locations $\bm {s}_ i$ and $\bm {s}_ j$, the weight $w_{ij}$ denotes the effect of the value $z_ i$ at location $\bm {s}_ i$ on the value $z_ j$ at location $\bm {s}_ j$. Depending on the nature of your study, the weights $w_{ij}$ need not be symmetric; that is, it can be true that $w_{ij} \neq w_{ji}$.

Binary and Nonbinary Weights

The weights $w_{ij}$ can be either binary or nonbinary values. Binary values of 1 or 0 are assigned if the SRF $Z(\bm {s}_ i)$ at one location $\bm {s}_ i$ is deemed to be connected or not, respectively, to its value $Z(\bm {s}_ j)$ at another location $\bm {s}_ j$. Nonbinary values can be used in the presence of more refined measures of connectivity between any two data points $P_ i$ and $P_ j$. PROC VARIOGRAM offers a choice between a binary and a distance-based nonbinary weighting scheme.

In the binary weighting scheme the weight $w_{ij}=1$ if the data pair at $\bm {s}_ i$ and $\bm {s}_ j$ is closer than the user-defined distance that is defined by the LAGDISTANCE= option, and $w_{ij}=0$ if $i=j$ or in any other case. For that reason, in the COMPUTE statement, if you specify the WEIGHTS=BINARY suboption of the AUTOCORRELATION option when the NOVARIOGRAM option is also specified, then you must also specify the LAGDISTANCE= option.

The nonbinary weighting scheme is based on the pair distances and is invoked with the WEIGHTS=DISTANCE suboption of the AUTOCORRELATION option. PROC VARIOGRAM uses a variation of the Pareto form functional to set the weights. Namely, the autocorrelation weight for every point pair $P_ i$ and $P_ j$ located at $\bm {s}_ i$ and $\bm {s}_ j$, respectively, is defined as

\[  w_{ij} = s \frac{1}{1+ \mid \bm {h} \mid ^{p}}  \]

where $\bm {h} = \bm {s}_ i - \bm {s}_ j$ and $p\geq 0$ and $s\geq 0$ are user-defined parameters for the adjustment of the weights.

In particular, the power parameter p is specified in the POWER= option of the DISTANCE suboption within the AUTOCORRELATION option. The default value for this parameter is p = 1. Also, the scaling parameter s is specified by the SCALE= option in the DISTANCE suboption of the AUTOCORRELATION option. The default value for the scaling parameter is s = 1. You can use the p and s parameters to adjust the actual values of the weights according to your needs. Variations in the scaling parameter s do not affect the computed values of the Moran’s I and Geary’s c autocorrelation coefficients that are introduced in the section Autocorrelation Statistics Types.

Nonbinary Weights with Normalized Distances

PROC VARIOGRAM offers additional flexibility in the DISTANCE weighting scheme through an option to use normalized pair distances. You can invoke this feature by specifying the NORMALIZE option in the DISTANCE suboption of the AUTOCORRELATION option. In this case, the distances used in the definition of the weights are normalized by the maximum pairwise distance $h_ b$ (see the section Computation of the Distribution Distance Classes and Figure 109.24); the weights are then defined as $w_{ij} = s/[1+( \mid \bm {h} \mid /h_ b)^{p}]$.

Most likely, $h_ b$ has a different value for different data sets. Hence, it is suggested that you avoid using the weights you obtain from the preceding equation and one data set for comparisons with the weights you derive from different data sets.

Symmetric and Asymmetric Weights

The weighting schemes presented in the preceding paragraphs are symmetric; that is, $w_{ij} = w_{ji}$ for every data pair at locations $\bm {s}_ i$ and $\bm {s}_ j$. However, you can also define asymmetric weights $w’_{ij}$ such that

\[  \sum _{j \in J}w’_{ij} = 1  \]

for $i=1,2,\cdots ,n$, where $w’_{ij} = w_{ij}/\sum _{j \in J}w_{ij}$, $i=1,2,\cdots ,n$. In the distance-based scheme, J is the set of all locations that form point pairs with the point at $\bm {s}_ i$. In the binary scheme, J is the set of the locations that are connected to $\bm {s}_ i$ based on your selection of the LAGDISTANCE= option; see Cliff and Ord (1981, p. 18). The weights $w’_{ij}$ are row-averaged (or standardized by the count of their connected neighbors). You can apply row averaging in weights when you specify the ROWAVG option within either the BINARY or DISTANCE suboptions in the AUTOCORRELATION option.