PROC SIM2D: Theoretical Development :: SAS/STAT(R) 9.3 User's Guide

Theoretical Development

It is a simple matter to produce an $\text{[math]}$ random number, and by stacking $\text{[math]}$ $\text{[math]}$ random numbers in a column vector, you can obtain a vector with independent standard normal components $\text{[math]}$ . The meaning of the terms independence and randomness in the context of a deterministic algorithm required for the generation of these numbers is subtle; see Knuth (1981, Chapter 3) for details.

Rather than $\text{[math]}$ , what is required is the generation of a vector $\text{[math]}$ —that is,

$\text{[math]}$

with covariance matrix

$\text{[math]}$

If the covariance matrix is symmetric and positive definite, it has a Cholesky root $\text{[math]}$ such that $\text{[math]}$ can be factored as

$\text{[math]}$

where $\text{[math]}$ is lower triangular. See Ralston and Rabinowitz (1978, Chapter 9, Section 3-3) for details. This vector $\text{[math]}$ can be generated by the transformation $\text{[math]}$ . Here is where the assumption of a Gaussian SRF is crucial. When $\text{[math]}$ , then $\text{[math]}$ is also Gaussian. The mean of $\text{[math]}$ is

$\text{[math]}$

and the variance is

$\text{[math]}$

Consider now an SRF $\text{[math]}$ , with spatial covariance function $\text{[math]}$ . Fix locations $\text{[math]}$ , and let $\text{[math]}$ denote the random vector

$\text{[math]}$

with corresponding covariance matrix

$\text{[math]}$

Since this covariance matrix is symmetric and positive definite, it has a Cholesky root, and the $\text{[math]}$ , can be simulated as described previously. This is how the SIM2D procedure implements unconditional simulation in the zero-mean case. More generally,

$\text{[math]}$

where $\text{[math]}$ is a quadratic form in the coordinates $\text{[math]}$ and the $\text{[math]}$ is an SRF that has the same covariance matrix $\text{[math]}$ as previously. In this case, the $\text{[math]}$ , is computed once and added to the simulated vector $\text{[math]}$ , for each realization.

For a conditional simulation, this distribution of

$\text{[math]}$

must be conditioned on the observed data. The relevant general result concerning conditional distributions of multivariate normal random variables is the following. Let $\text{[math]}$ , where

$\text{[math]}$

The subvector $\text{[math]}$ is $\text{[math]}$ , $\text{[math]}$ is $\text{[math]}$ , $\text{[math]}$ is $\text{[math]}$ , $\text{[math]}$ is $\text{[math]}$ , and $\text{[math]}$ is $\text{[math]}$ , with $\text{[math]}$ . The full vector $\text{[math]}$ is partitioned into two subvectors, $\text{[math]}$ and $\text{[math]}$ , and $\text{[math]}$ is similarly partitioned into covariances and cross covariances.

With this notation, the distribution of $\text{[math]}$ conditioned on $\text{[math]}$ is $\text{[math]}$ , with

$\text{[math]}$

and

$\text{[math]}$

See Searle (1971, pp. 46–47) for details. The correspondence with the conditional spatial simulation problem is as follows. Let the coordinates of the observed data points be denoted $\text{[math]}$ , with values $\text{[math]}$ . Let $\text{[math]}$ denote the random vector

$\text{[math]}$

The random vector $\text{[math]}$ corresponds to $\text{[math]}$ , while $\text{[math]}$ corresponds to $\text{[math]}$ . Then $\text{[math]}$ as in the previous distribution. The matrix

$\text{[math]}$

is again positive definite, so a Cholesky factorization can be performed.

The dimension $\text{[math]}$ for $\text{[math]}$ is simply the number of nonmissing observations for the VAR= variable; the values $\text{[math]}$ are the values of this variable. The coordinates $\text{[math]}$ are also found in the DATA= data set, with the variables that correspond to the $\text{[math]}$ and $\text{[math]}$ coordinates identified in the COORDINATES statement. Note: All VAR= variables use the same set of conditioning coordinates; this fixes the matrix $\text{[math]}$ for all simulations.

The dimension $\text{[math]}$ for $\text{[math]}$ is the number of grid points specified in the GRID statement. Since there is a single GRID statement, this fixes the matrix $\text{[math]}$ for all simulations. Similarly, $\text{[math]}$ is fixed.

The Cholesky factorization $\text{[math]}$ is computed once, as is the mean correction

$\text{[math]}$

The means $\text{[math]}$ and $\text{[math]}$ are computed using the grid coordinates $\text{[math]}$ , the data coordinates $\text{[math]}$ , and the quadratic form specification from the MEAN statement. The simulation is now performed exactly as in the unconditional case. A $\text{[math]}$ vector of independent standard $\text{[math]}$ random variables is generated and multiplied by $\text{[math]}$ , and $\text{[math]}$ is added to the transformed vector. This is repeated $\text{[math]}$ times, where $\text{[math]}$ is the value specified for the NR= option.

The SIM2D Procedure

Theoretical Development