The SPP Procedure

PROCESS Statement

  • PROCESS name = (variables </pattern-options> )</process-options <distance-function-options>> ;

The PROCESS statement defines a point pattern for analysis. You must use a valid SAS variable name to define the process, and you can describe it by using variables that contain the x and y coordinates of the points within the point pattern. The variables must also be in the DATA= data set. You can specify only one PROCESS statement in PROC SPP.

The coordinates in spatial data can be spherical (represented as longitude and latitude) or projected (represented as Cartesian x and y coordinates). All the SAS/STAT procedures that analyze spatial data, including PROC SPP, assume that you are working with projected coordinates, for which Euclidean distance is appropriate. If your data consist of spherical coordinates, you are responsible for transforming the data to projected coordinates, such as by using PROC GPROJECT in SAS/GRAPH software. For more information about the spatial modeling issues that pertain to the use of geodetic versus simple Euclidean distance, see Banerjee (2005).

You can also specify pattern-options and process-options. The pattern-options are related to different attributes of the observed point pattern that is read from the DATA= data set. The process-options represent different analyses that are associated with a point pattern. These analyses are usually helpful in characterizing the underlying stochastic process that might have generated the point pattern. The PROCESS statement’s pattern-options are listed in Table 105.4. The PROCESS statement’s process-options are listed in Table 105.5.

Table 105.4: Point Pattern Definition Options

Option

Description

AREA=

Specifies a rectangular study window

EVENT=

Specifies an EVENT variable that identifies individual point pattern events

MARK=

Specifies the MARK variable for the point pattern


You can specify the following pattern-options, which enable you to describe various aspects of a point pattern data set:

AREA=(xmin-number, ymin-number, xmax-number, ymax-number)

specifies parameters that define the study area bounds for the spatial point pattern. This option describes is a key attribute that governs the intensity estimates that are obtained by different methods in PROC SPP. When you specify this option, you must identify all the following area specifications:

  • xmin-number, the lower left limit for the x coordinate

  • ymin-number, the lower left limit for the y coordinate

  • xmax-number, the upper right limit for the x coordinate and

  • ymax-number, the upper right limit for the y coordinate

If there are BY groups in the DATA= data set, then the explicit bounds remain the same across all BY groups. If you do not specify this option, then PROC SPP estimates a default area based on the Ripley-Rasson window estimator. For more information about the Ripley-Rasson window estimate, see the section Ripley-Rasson Window Estimator.

EVENT=variable-name

specifies an event variable that is associated with instances (points) in this point pattern. If your DATA= data set also contains information about covariates, use this option to identify the events in the point pattern.

MARK=variable-name

specifies a character or quantitative variable from the DATA= data set as a mark variable. Character variable marks are used for requesting distance function summary statistics across different variable values.

Table 105.5: PROCESS Statement Options

Option

Description

F

Computes the empty-space F function

G

Computes the G function

J

Computes the J function

K

Computes the K function to test for complete spatial randomness (CSR)

KERNEL

Obtains a nonparametric intensity estimate of the point pattern

L

Computes the L function

OUTSIM

Specifies an output data set to store the simulated data sets in computation of distance functions

PCF

Computes the PCF function

QUADRAT

Performs a quadrat based test for CSR


You can specify the following process-options to study the point pattern data set and the underlying spatial point process that is likely to have generated this pattern:

F<GRID(value-NX,value-NY)>

performs a test for complete spatial randomness that is based on the empty-space F function. For more information about the F function and related functions see the section Statistics Based on Second-Order Characteristics. You can specify the following suboption:

GRID(value-NX, value-NY)

specifies a reference grid for computing the empty-space F function, where value-NX represents the number of horizontal divisions and value-NY represents the number of vertical divisions. By default, the SPP procedure uses a $50 \times 50$ grid.

G

performs a test for complete spatial randomness that is based on the nearest-neighbor G function.

J<GRID(value-NX, value-NY)>

performs a test for complete spatial randomness that is based on the J function. You can specify the following suboption:

GRID(value-NX, value-NY)

specifies a reference grid for computing the J function, where value-NX represents the number of horizontal divisions and value-NY represents the number of vertical divisions. By default, the SPP procedure uses a $50 \times 50$ grid.

K

performs a test for complete spatial randomness that is based on the K function.

KERNEL<(kernel-suboptions)>

produces a nonparametric estimate of the first-order intensity, or a nonparametric smoothed estimate of a quantitative mark variable of the point pattern, depending on the kernel-suboptions. When you do not specify the kernel-suboptions, PROC SPP computes a nonparametric intensity estimate that is based on a default bandwidth and uses a Gaussian kernel. You can specify the following kernel-suboptions.

TYPE=EPANECHNIKOV | GAUSSIAN | QUARTIC | TRIANGULAR | UNIFORM

specifies the kernel type for obtaining the nonparametric estimate. For more information about the different kernel types that PROC SPP supports, see the section Nonparametric Intensity Estimation. By default, TYPE=GAUSSIAN.

B=value

specifies the value for the kernel bandwidth parameter. The bandwidth is a nonnegative number. By default, the SPP procedure uses a bandwidth of $0.1 / \sqrt {\lambda }$, where $\lambda $ is the CSR average intensity of the point pattern (Illian et al. 2008, p. 236).

ADAPTIVE

performs adaptive kernel estimation. Adaptive kernel estimation requires an initial bandwidth value to compute bandwidth estimates for each data point. If you specify a bandwidth in the B= kernel-suboption, then the SPP procedure uses this value as the initial bandwidth. Otherwise, it uses a default bandwidth value that is based on the suggestion by Illian et al. (2008, p.236). For more information about adaptive kernel estimation, see the section Nonparametric Intensity Estimation.

OUT=SAS-data-set

specifies the name of a SAS-data-set to contain the kernel based nonparametric estimates.

GRID(value-NX, value-NY)

specifies a reference grid for computing the kernel estimate, where value-NX represents the number of horizontal divisions and value-NY represents the number of vertical divisions. By default, the SPP procedure uses a $50 \times 50$ grid.

L

performs a test for complete spatial randomness that is based on the L function.

OUTSIM=SAS-data-set

specifies the name of a SAS-data-set to contain the results of simulations in distance functions. This option is ignored unless one of the distance functions is specified in the PROCESS statement.

PCF<B=value>

performs a test for complete spatial randomness that is based on the pair correlation function (PCF) function. The pair correlation function is calculated only when you specify EDGECORR=ON in the PROC SPP statement. You can specify the following suboption:

B=value

specifies the bandwidth value to use in the kernel density estimation inside the pair correlation function. The value must be a nonnegative real number. Otherwise, it is assigned a default value of $0.1 / \sqrt {\lambda }$, where $\lambda $ is the CSR average intensity of the point pattern or of the current categorical mark type (Illian et al. 2008, p. 236).

QUADRAT<(<value-NX,value-NY> </DETAILS>)>

performs a test for complete spatial randomness. You can specify value-NX and value-NY to provide a quadrat specification that includes the number of horizontal and vertical divisions. If you do not specify the number of horizontal and vertical divisions, PROC SPP computes a default quadrat of $10 \times 10$. By default, the QUADRAT option displays only the Pearson chi-square test for CSR. If you also specify the DETAILS suboption, then PROC SPP displays the quadrat count in addition to the Pearson residual information.

When you specify an F, G, J, K, L, or PCF process-option (shown in Table 105.5), you can also specify the following distance-function-options.

Table 105.6: Distance Function Options

Option

Description

BYTYPE

Requests categorical mark typewise calculation of distance functions

CROSS

Requests cross-type distance function analysis that is based on the categorical mark that is specified in the MARK= option

MAXDIST=

Specifies the ending distance for distance functions

MINDIST=

Specifies the starting distance for distance functions

NDIST=

Specifies the number of distances to use for different distance functions

NSIM=

Specifies the number of simulations to compute the CSR envelope

BLOCKS

Specifies the block size for calculation of confidence intervals for distance functions


BYTYPE(ALL|value-list)

requests distance function calculation by values of the mark variable. This option produces individual distance function calculations for each mark type. You can specify the following options:

ALL

requests distance function calculation for all available character mark variable values in the DATA= data set.

value-list

requests distance function calculation for certain formatted mark variable values, which you specify as quoted strings in the value-list.

CROSS=TYPES(value-list1<,value-list2>)

requests cross-type distance function analysis between different mark values. For cross-type analysis, you must specify a mark variable in the point pattern definition by using the MARK= pattern-option. The CROSS= option applies only to any requested distance functions K, L, G, J, or PCF. You must specify the TYPES suboption as follows:

TYPES(value-list1<,value-list2>)

requests cross-type analysis only among types that are specified in value-list1 and an optional value-list2. If you specify only value-list1, then PROC SPP performs cross-type analysis within all the types that are specified in value-list1. If you also specify the additional value-list2, PROC SPP performs cross-type analysis across both lists. For value-list1 and value-list2, specify quoted strings that correspond to values of the variable that is specified in the MARK=pattern-option.

MAXDIST=value | MAX | CUT

specifies the option to be used for computing the maximum distance for different distance functions. You can specify the following options:

value

specifies a value for the maximum distance for performing distance function calculations. The value must be positive and larger than the value of the MINDIST=value option. You can specify any positive value for the maximum distance. However, values that are too large might produce artifacts that do not reflect the true underlying process.

MAX

uses the maximum possible distance, based on the suggestion by Baddeley and Turner (2013). The maximum possible distance is calculated as follows:

  • For the K and L functions, the maximum possible distance is calculated as

    \[ \min \{ \min \{ \mr{Range}(x), \mr{Range}(y) \} /4, \sqrt {1000/(\pi \times \lambda )}\} \]

    where $\lambda $ is the intensity of the point pattern in the study area and the ranges of x and y are computed over the minimum bounding rectangular window of the study area.

  • For the PCF functions, the maximum possible distance is calculated as in the case of K and L functions except that the ranges of x and y are computed over a block division of the study area and the $\lambda $ corresponds to the intensity in a block division. The computed maximum distance for the PCF distance is the minimum of the maximum distance computed over all the block divisions in the study area.

  • For the F and G functions, the maximum possible distance is calculated as

    \[ \min \{ \mr{Diameter}(W)/2, \sqrt {\log (100000)/(\pi \times \lambda )}\} \]

    where $\lambda $ is the intensity of the point pattern in the study area and W is the minimum bounding rectangular window of the study area.

  • For the J function, the maximum possible distance is calculated as $\mr{Diameter}(W)/2$.

CUT

uses the maximum distance at certain cutoff values that are recommended by Baddeley (2014). The cutoff values are as follows:

  • for the F and G functions, the distance at which the F or G value reaches 0.9

  • for the J function, the distance at which the F or G value in the calculation of the J function reaches 0.9

  • for the PCF function, the distance that corresponds to the MAX option that is applied to individual subdivisions of the study area for computing the confidence interval of the PCF statistic

  • for the K and L functions, the distance that corresponds to theMAX option that is applied to the entire study area

By default, PROC SPP uses the value of MAXDIST is CUT.

MINDIST=value

specifies a positive number for the minimum distance (or starting) distance for all distance function calculations. The value of this option cannot be more than the value of MAXDIST= option.

NDIST=value

specifies the number of distance bins with which to compute all the specified distance functions. This is a global option that applies to all specified distance functions. When you specify a value for this option, the SPP procedure uses this value instead of others for distance function calculations.

NSIM=value

specifies a positive integer for the number of simulations to be used to compute envelopes for the CSR tests in all distance functions. When you specify this option, it applies to all specified distance functions.

BLOCKS(NX, NY)

specifies the block size that is required for calculating the confidence intervals of distance functions, where NX specifies the number of horizontal blocks and NY specifies the number of vertical blocks. The block size should be neither too small nor too large for this option to behave reasonably. For more information about estimating the confidence intervals for distance functions see the section Confidence Intervals for Summary Statistics. The default block size is $5 \times 5$.