The VARIOGRAM Procedure

PROC VARIOGRAM Statement

PROC VARIOGRAM options;

The PROC VARIOGRAM statement invokes the VARIOGRAM procedure. Table 122.1 summarizes the options available in the PROC VARIOGRAM statement.

Table 122.1: PROC VARIOGRAM Statement Options

Option	Description
DATA=	Specifies the input data set
IDGLOBAL	Labels observations across BY groups using ascending observation numbers
IDNUM	Labels observations using the observation number
NOPRINT	Suppresses normal display of results
OUTACWEIGHTS=	Specifies a data set to store autocorrelation weights information
OUTDISTANCE=	Specifies a data set to store summary distance information
OUTMORAN=	Specifies a data set to store Moran scatter plot information
OUTPAIR=	Specifies a data set to store pairwise point information
OUTVAR=	Specifies a data set to store spatial continuity measures
PLOTS	Specifies the plot display and options

You can specify the following options in the PROC VARIOGRAM statement.

DATA=SAS-data-set

specifies a SAS data set that contains the x and y coordinate variables and the VAR statement variables.

IDGLOBAL

specifies that ascending observation numbers be used across BY groups for the observation labels in the appropriate output data sets and the OBSERVATIONS plot, instead of resetting the observation number in the beginning of each BY group. The IDGLOBAL option is ignored if no BY variables are specified. Also, if you specify the ID statement, then the IDGLOBAL option is ignored unless you also specify the IDNUM option in the PROC VARIOGRAM statement.

IDNUM

specifies that the observation number be used for the observation labels in the appropriate output data sets and the OBSERVATIONS plot. The IDNUM option takes effect when you specify the ID statement; otherwise, it is ignored.

NOPRINT

suppresses the normal display of results. The NOPRINT option is useful when you want only to create one or more output data sets with the procedure.

Note: This option temporarily disables the Output Delivery System (ODS); see the section ODS Graphics for more information.

OUTACWEIGHTS=SAS-data-set OUTACW=SAS-data-set OUTA=SAS-data-set

specifies a SAS data set in which to store the autocorrelation weights information for each pair of points in the DATA= data set. Use this option with caution when the DATA= data set is large. If n denotes the number of observations in the DATA= data set, then the OUTACWEIGHTS= data set contains $[n(n-1)]/2$ observations.

See the section OUTACWEIGHTS=SAS-data-set for details.

OUTDISTANCE=SAS-data-set OUTDIST=SAS-data-set OUTD=SAS-data-set

specifies a SAS data set in which to store summary distance information. This data set contains a count of all pairs of data points within a given distance interval. The number of distance intervals is controlled by the NHCLASSES= option in the COMPUTE statement. The OUTDISTANCE= data set is useful for plotting modified histograms of the count data for determining appropriate lag distances. See the section OUTDIST=SAS-data-set for details.

OUTMORAN=SAS-data-set OUTM=SAS-data-set

specifies a SAS data set in which to store information that is illustrated in the Moran plot, namely the standardized value of each observation in the DATA= data set and the weighted average of its local neighbors. You must also specify the LAGDISTANCE= and AUTOCORRELATION options in the COMPUTE statement; otherwise, the OUTMORAN= data set request is ignored.

The OUTMORAN= data set is useful when you want to save the information that is illustrated in the Moran scatter plot. The data set can also contain entries of missing observations with neighbors, although these observations are not displayed in the Moran plot. However, if the only observations with neighbors in your input data set are observations with missing values, then the OUTMORAN= output data set is empty.

See the section OUTMORAN=SAS-data-set for details.

OUTPAIR=SAS-data-set OUTP=SAS-data-set

specifies a SAS data set in which to store distance and angle information for each pair of points in the DATA= data set.

Use this option with caution when your DATA= data set is large. Assume that your DATA= data set has n observations. When you specify the NOVARIOGRAM option in the COMPUTE statement, the OUTPAIR= data set is populated with all $[n(n-1)]/2$ pairs that can be formed with the n observations.

If the NOVARIOGRAM option is not specified, then the OUTPAIR= data set contains only pairs of data that are located within a certain distance away from each other. Specifically, it contains pairs whose distance between observations belongs to a lag class up to the specified MAXLAGS= option in the COMPUTE statement. Then, depending on your specification of the LAGDISTANCE= and MAXLAGS= options, the OUTPAIR= data set might contain $[n(n-1)]/2$ or fewer pairs.

Finally, you can restrict the number of pairs in the OUTPAIR= data set with the OUTPDISTANCE= option in the COMPUTE statement. The OUTPDISTANCE= option in the COMPUTE statement excludes pairs of points when the distance between the pairs exceeds the OUTPDISTANCE= value.

See the section OUTPAIR=SAS-data-set for details.

OUTVAR=SAS-data-set OUTVR=SAS-data-set

specifies a SAS data set in which to store the continuity measures.

See the section OUTVAR=SAS-data-set for details.

PLOTS <(global-plot-options)> <= plot-request<(options)>> PLOTS <(global-plot-options)> <= (plot-request<(options)> <…plot-request<(options)>>)>

controls the plots produced through ODS Graphics. When you specify only one plot request, you can omit the parentheses around the plot request. Here are some examples:

plots=none
plots=observ
plots=(observ semivar)
plots(unpack)=semivar
plots=(semivar(cla unpack) semivar semivar(rob))

ODS Graphics must be enabled before plots can be requested. For example:

ods graphics on;

proc variogram data=sashelp.thick;
   compute novariogram;
   coordinates xc=East yc=North;
   var Thick;
run;

ods graphics off;

For more information about enabling and disabling ODS Graphics, see the section Enabling and Disabling ODS Graphics in Chapter 21: Statistical Graphics Using ODS.

If ODS Graphics is enabled but you omit the PLOTS option or have specified PLOTS=ALL, then PROC VARIOGRAM produces a default set of plots, which might be different for different COMPUTE statement options, as discussed in the following.

If you specify NOVARIOGRAM in the COMPUTE statement, the VARIOGRAM procedure produces a scatter plot of your observations spatial distribution, in addition to the histogram of the pairwise distances of your data. For an example of the observations plot, see Figure 122.2. For an example of the pairwise distances plot, see Figure 122.4.
If you omit NOVARIOGRAM in the COMPUTE statement, the VARIOGRAM procedure computes the empirical semivariogram for the specified LAGDISTANCE= and MAXLAGS= options. The observations plot appears by default in this case too. The VARIOGRAM procedure also produces a plot of the classical empirical semivariogram. If you also specify ROBUST in the COMPUTE statement, then the VARIOGRAM procedure instead produces a plot of both the classical and robust empirical semivariograms, in addition to the observations plot. For an example of the empirical semivariogram plot, see Figure 122.7. Moreover, if you specify the MODEL statement and perform model fitting, then PROC VARIOGRAM also produces a fit plot of the fitted semivariogram. An example of the fit plot is shown in Figure 122.16.

The following global-plot-options are available:

ONLY

suppresses the default plots. Only plots that are specifically requested are displayed.

UNPACKPANEL UNPACK

suppresses paneling. By default, multiple plots can appear in some output panels. Specify UNPACKPANEL to get each plot in a separate panel. You can specify PLOTS(UNPACKPANEL) to unpack the default plots. You can also specify UNPACKPANEL as a suboption with the SEMIVAR option.

The following individual plot-requests and plot options are available:

ALL

produces all appropriate plots. You can specify other options with ALL. For example, to request all default plots and an additional classical empirical semivariogram, specify PLOTS=(ALL SEMIVAR(CLA)).

EQUATE

specifies that all appropriate plots be produced in a way that the coordinates of the axes have equal size units.

FITPLOT <(fitplot-options)> FIT <(fitplot-options)>

requests a plot that shows the model fitting results against the empirical semivariogram. By default, FITPLOT displays one plot of the fitted model (or a panel of plots for different angles in the anisotropic case).

If you specify the FORM=AUTO option in the MODEL statement, then each class of equivalent fitted models is displayed with a different curve on the plot. The best fitting model class is chosen based on the criteria that you specify in the CHOOSE option of the MODEL statement, and a thicker line on top of any other curve is shown for it. The plot legend shows the ranked classes by displaying the label of the representative model of each class in the plot. If appropriate, the number of additional models in the same equivalence class also shows within parentheses.

You can specify the following fitplot-options:

NCLASSES=number NCLASSES=ALL: specifies the maximum number of classes to display on the fit plot, where number is a positive integer. The default is NCLASSES=5 for nonpaneled plots and NCLASSES=3 for paneled plots. The option takes effect when you specify the FORM=AUTO option in the MODEL statement, and it is ignored when you fit one single model. If you specify NCLASSES=ALL or a larger number than the available classes, then all available classes are shown on the fit plot. If you specify multiple instances of the NCLASSES= option, then only the last specified instance is honored.
UNPACK: suppresses paneling in paneled fit plots. By default, fit plots appear in a panel, when appropriate.

MORAN <(moran-options)> MOR <(moran-options)>

produces a Moran scatter plot of the observations with nonmissing values. For more details about this plot, see the section The Moran Scatter Plot. In addition to the Moran scatter plot points, the plot also displays the fit line for the linear regression of the weighted average on the standardized observation values, the regression fit line slope, and a reference line with slope equal to 1. The MORAN plot has the following moran-options:

LABEL < ( label-options ) >

labels the observations. The label is the ID variable if the ID statement is specified; otherwise, it is the observation number. The label-options can be one or more of the following:

HH: specifies that labels show for observations in the upper right (high-high) plot quadrant of positive spatial association.
HL: specifies that labels show for observations in the lower right (high-low) plot quadrant of negative spatial association.
LH: specifies that labels show for observations in the upper left (low-high) plot quadrant of negative spatial association.
LL: specifies that labels show for observations in the lower left (low-low) plot quadrant of positive spatial association.

If you specify multiple instances of the MORAN option and you specify the LABEL suboption in any of those, then the resulting Moran scatter plot displays the observations labels. By default, when you specify none of the label-options, the PLOTS=MORAN(LABEL) request puts labels in all observations.

ROWAVG=rowavg-option

specifies the flag value for row-averaging of weights in the computation of the weighted average. The rowavg-option can be either of the following:

OFF: specifies that autocorrelation weights not be row-averaged.
ON: specifies that row-averaged autocorrelation weights be used.

The default behavior is ROWAVG=ON. If you specify the ROWAVG= option more than once in the same MORAN plot request, then the behavior is set to ROWAVG=ON unless any of the instances is ROWAVG=OFF.

When you specify the PLOTS=MORAN option, you must specify both the AUTOCORRELATION and the LAGDISTANCE= options in the COMPUTE statement to produce the Moran scatter plot. For more information about the plot, see the section The Moran Scatter Plot.

NONE

suppresses all plots.

OBSERVATIONS <(observations-plot-options)> OBSERV <(observations-plot-options)> OBS <(observations-plot-options)>

produces the observed data plot. Only one observations plot is created if you specify the OBSERVATIONS option more than once within a PLOTS option.

The OBSERVATIONS option has the following suboptions:

GRADIENT

specifies that observations be displayed as circles colored by the observed measurement.

LABEL < ( label-option ) >

labels the observations. The label is the ID variable if the ID statement is specified; otherwise, it is the observation number. The label-option can be one of the following:

EQ=number: specifies that labels show for any observation whose value is equal to the specified number.
MAX=number: specifies that labels show for observations with values smaller than or equal to the specified number.
MIN=number: specifies that labels show for observations with values equal to or greater than the specified number.

If you specify multiple instances of the OBSERVATIONS option and you specify the LABEL suboption in any of those, then the resulting observations plot displays the observations labels. If more than one label-option is specified in multiple LABEL suboptions, then the prevailing label-option in the resulting OBSERVATIONS plot emerges by adhering to the choosing order: MIN, MAX, EQ.

OUTLINE

specifies that observations be displayed as circles with a border but with a completely transparent fill.

OUTLINEGRADIENT

is the same as OBSERVATIONS(GRADIENT) except that a border is shown around each observation.

SHOWMISSING

specifies that observations with missing values be displayed in addition to the observations with nonmissing values. By default, missing values locations are not shown on the plot. If you specify multiple instances of the OBSERVATIONS option and you specify the SHOWMISSING suboption in any of those, then the resulting observations plot displays the observations with missing values.

If you omit any of the GRADIENT, OUTLINE, and OUTLINEGRADIENT suboptions, the OUTLINEGRADIENT is the default suboption. If you specify multiple instances of the OBSERVATIONS option or multiple suboptions for OBSERVATIONS, then the resulting observations plot honors the last specified GRADIENT, OUTLINE, or OUTLINEGRADIENT suboption.

PAIRS <(pairs-plot-options)>

specifies that the pairwise distances histogram be produced. By default, the horizontal axis displays the lag class number. The vertical axis shows the frequency (count) of pairs in the lag classes. Notice that the zero lag class width is half the width of the other classes.

The PAIRS option has the following suboptions:

MIDPOINT MID: specifies that the plot that is created with the PAIRS option display the lag class midpoint value on the horizontal axis, rather than the default lag class number. The midpoint value is the actual distance of a lag class center from the assumed origin point at distance zero. See also the illustration in Figure 122.22.
NOINSET NOI: specifies that the plot created with the PAIRS option be produced without the default inset that provides additional information about the pairs distribution.
THRESHOLD=minimum pairs THR=minimum pairs: specifies that a reference line appear in the plot that is created with the PAIRS option to indicate the minimum pairs frequency of data pairs. You can use this line as an exploratory tool when you want to select lag classes that contain at least THRESHOLD point pairs. The option helps you to identify visually any portion of the PAIRS distribution that lies below the specified THRESHOLD value.

Only one pairwise distances histogram is created if you specify the PAIRS option within a PLOTS option. If you specify multiple instances of the PAIRS option, the resulting plot has the following features:

If the MIDPOINT or NOINSET suboption has been specified in any of the instances, it is activated in the resulting plot.
If you have specified the THRESHOLD= suboption more than once, then the THRESHOLD= value specified last prevails.

SEMIVARIOGRAM <(semivar-plot-options)> SEMIVAR <(semivar-plot-options)>

specifies that the empirical semivariogram plot be produced. You can specify the SEMIVAR option multiple times in the same PLOTS option to request instances of plots with the following semivar-plot-options:

ALL | CLASSICAL | ROBUST ALL | CLA | ROB: specifies a single type of empirical semivariogram (classical or robust) to plot, or specifies that all the available types be included in the same plot. The default is ALL.
UNPACKPANEL UNPACK: specifies that paneled semivariogram plots be displayed separately. By default, plots appear in a panel, when appropriate.