The SPP Procedure

Example 105.3 Intensity Model Validation Diagnostics

Model validation diagnostics help you evaluate whether the fitted model that involves the covariates is appropriate for the specified point pattern. The SPP procedure provides two types of diagnostics:

goodness-of-fit test
residual diagnostics

This example demonstrates the usage of these diagnostics for model validation. It uses a simulated point pattern data set and also simulates two covariates over a $50\times 50$ grid. The following statements define functions for simulating a spatial point pattern given an intensity function by the method of random thinning of Lewis and Shedler (1979), as discussed in Schabenberger and Gotway (2005) and Wicklin (2013). For more information about the method and the code, see Wicklin (2013). The functions are saved in a SAS/IML storage catalog to make them available for reuse.

ods graphics on;

proc iml;
   start Uniform2d(n, a, b);
      u = j(n, 2);
      call randgen(u, "Uniform");
      return( u # (a||b) );
   finish;

   start HomogPoissonProcess(lambda, a, b);
      n = 1;
      call randgen(n,"Poisson", lambda*2500);
      return( Uniform2d(n, a, b) );
   finish;

   start InhomogPoissonProcess(a, b) global(lambda0);
      u = HomogPoissonProcess(lambda0, a, b);
      lambda = Intensity(u[,1], u[,2]);
      r = shape(.,sum(lambda<=lambda0),1);
      call randgen(r,"Bernoulli", lambda[loc(lambda<=lambda0)]/lambda0);
      return( u[loc(r),] );
   finish;

   reset storage=sasuser.SPPThin;
   store module=Uniform2d
         module=HomogPoissonProcess
         module=InhomogPoissonProcess;
quit;

The following statements define a certain intensity function that is based on the elevation and slope of the land around particular hills, with hills characterized by a four-column matrix Hills, where the first two columns give the X and Y coordinates of each hill center and the last two columns give their height and radii. In the model, both elevation and slope are assumed to be positive, with a negative effect on intensity, so lambda0 (the maximum value of intensity) is the value at Elevation=Slope=0.

proc iml;
   %let xH = Hills[iHill,1];
   %let yH = Hills[iHill,2];
   %let hH = Hills[iHill,3];
   %let rH = Hills[iHill,4];

   start Elevation(x,y) global(Hills);
      Elevation = 0;
      do iHill = 1 to nrow(Hills);
         Height = &hH*exp(-((x - &xH)##2 + (y - &yH)##2)/&rH);
         Elevation = Elevation + Height;
      end;
      return(Elevation);
   finish;

   start Slope(x,y) global(Hills);
      xslope = 0;
      yslope = 0;
      do iHill = 1 to nrow(Hills);
         Height = &hH*exp(-((x - &xH)##2 + (y - &yH)##2)/&rH);
         dxHeight = -2*Height#(x - &xH)/&rH;
         dyHeight = -2*Height#(y - &yH)/&rH;
         xslope   = xslope + dxHeight;
         yslope   = yslope + dyHeight;
      end;
      Slope = sqrt(xslope##2 + yslope##2);
      return(Slope);
   finish;

   start Intensity(x,y) global(lambda0);
      lin = 0.5 - 2*Elevation(x,y) - 10*Slope(x,y);
      return(exp(lin));
   finish;

   lambda0 = exp(0.5);

   reset storage=sasuser.SPPFlowers;
   store lambda0 module=Elevation module=Slope module=Intensity;
quit;

Finally, the following statements use the simulation method of Wicklin (2013) and the previously defined intensity to simulate a spatial point pattern on 10 hills in an area of $50 \times 50$ units. The covariates, Elevation and Slope, are also computed over a grid of points in the region of interest.

proc iml;
   reset storage=sasuser.SPPThin;
   load module=Uniform2d
        module=HomogPoissonProcess
        module=InhomogPoissonProcess;

   reset storage=sasuser.SPPFlowers;
   load  module=Elevation module=Slope module=Intensity lambda0;

   a = 50;
   b = 50;

   Hills = { 9.2 48.5 0.2 13.0,
           46.1 48.5 0.3 26.6,
            2.5  3.3 0.7 26.2,
           42.7  3.4 0.9 14.9,
           13.6 34.5 1.0 11.3,
           34.4 20.6 0.3 14.4,
           23.8 42.2 0.4 29.5,
           29.1 18.9 0.5 25.3,
           46.6 46.5 0.3 14.9,
           19.6 23.6 0.5  8.4};

   call randseed(12345);

   free Cov;
   Cov = j((a+1) * (b+1), 5, 0);
   do x = 0 to a; do y = 0 to b;
      Cov[x#(a+1)+y+1,] =  (x || y || Elevation(x,y) || Slope(x,y)
                           || Intensity(x,y));
   end; end;

   create Covariates var {"x" "y" "Elevation" "Slope" "Intensity"};
   append from Cov;
   close Covariates;

   Hills = Hills // {25 5 2 15};
   z = InhomogPoissonProcess(a, b);

   create Events var {"x" "y"};
   append from z;
   close;

quit;

data simAll;
   set Events(in=e) Covariates;
   Flowers = e;
run;

The point pattern data set and the covariate data set are combined in the simAll data set and the event observations can be identified by using a variable Flowers.

proc spp data=simAll plots(equate)=(trends observations);
   process trees = (x, y /area=(0,0,50,50) Event=Flowers);
   trend grad = field(x,y, Elevation);
   trend elev = field(x,y, Slope);
run;

Output 105.3.1 shows the point pattern. The point pattern has been simulated to include a Gaussian bump at the center of the study region.

Output 105.3.1: Spatial Point Pattern of Simulated Flowers

Output 105.3.2 shows the spatial covariate Elevation, and Output 105.3.3 shows the spatial covariate Slope. The covariates have been simulated to include several Gaussian hills, and they are continuous within the $50\times 50$ study region (that is, every point in the region has a value for these covariates).

Output 105.3.2: Spatial Covariate Elevation

Output 105.3.3: Spatial Covariate Slope

The following code fits an intensity model for the simulated point pattern that involves the simulated covariates Elevation and Slope. It also requests model validation diagnostics, including residuals and the goodness-of-fit test.

proc spp data=simAll seed=1  plots(equate)=(residual);
   process trees = (x,y /area=(0,0,50,50) Event=Flowers) /quadrat(4,2 /details);
   trend elev = field(x,y, Elevation);
   trend slope = field(x,y, Slope);
   model trees = elev slope/residual(b=5) gof;
run;

Output 105.3.4 shows the fitted intensity estimate that is based on the model that involves just the covariates Elevation and Slope.

Output 105.3.4: Fitted Intensity Estimate for the Simulated Point Pattern

Output 105.3.5 shows the residual diagnostics for the model. It is clear from the smoothed residual plot at the bottom right corner of Output 105.3.5 that the model that involves just the covariates Elevation and Slope fails to account for the Gaussian bump in the middle of the study region. This is revealed by the trend at the center of the smoothed residual plot at the bottom right corner of Output 105.3.5.

Output 105.3.5: Residual Diagnostics for the Fitted Model

Consequently, the goodness-of-fit test rejects the hypothesis that the point pattern was generated by the fitted model. This is evident in the low p-value that is obtained for the Pearson chi-square test for goodness of fit, which is shown in Output 105.3.6.

Output 105.3.6: Pearson Chi-Square Test for Goodness of Fit

Pearson Chi-Square Test for Goodness-of-Fit
DF	Dispersion Index	Chi-Square	Pr > ChiSq
7	12.742	89.19	<.0001

Output 105.3.7 shows the corresponding Pearson residuals for the goodness-of-fit test.

Output 105.3.7: Quadrat Information and Pearson Residuals for Goodness-of-Fit Test

Quadrat Information for Goodness-of-Fit Test
ID	Quadrat	Expected Frequency	Count	Percentage	Pearson Residual
1	(1,1)	352	375	13.58%	1.20
2	(2,1)	398	310	11.22%	-4.41
3	(3,1)	298	184	6.66%	-6.60
4	(4,1)	323	349	12.64%	1.45
5	(1,2)	383	450	16.29%	3.43
6	(2,2)	261	264	9.56%	0.19
7	(3,2)	383	397	14.37%	0.70
8	(4,2)	371	433	15.68%	3.22

When the model involves only the covariates Elevation and Slope, the residual diagnostics and the goodness-of-fit test both reveal discrepancies in the model that do not fully account for the simulated point pattern. In particular, the model misses the Gaussian bump in the middle of the study region.