Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
The GAM Procedure

Example 4.2: Comparing PROC GAM with PROC TPSPLINE

This example compares the GAM procedure with the TPSPLINE procedure, another nonparametric procedure that fits a smooth surface to multivariate data. It does not assume additivity of the model and uses very general basis functions for model fitting, making the TPSPLINE procedure much slower than the GAM procedure. For more details about the TPSPLINE procedure, refer to "The TPSPLINE Procedure" in SAS/STAT User's Guide, Version 8.

The data used here is also analyzed in "The TPSPLINE Procedure" in SAS/STAT User's Guide, Version 8. It presents age-adjusted melanoma incidences for 37 years from the Connecticut Tumor Registry (Houghton, Flannery, and Viola 1980):

   title 'Comparing PROC GAM with PROC TPSPLINE';
   data melanoma;
      input  year incidences @@;
      datalines;
   1936    0.9   1937   0.8  1938   0.8  1939   1.3
   1940    1.4   1941   1.2  1942   1.7  1943   1.8
   1944    1.6   1945   1.5  1946   1.5  1947   2.0
   1948    2.5   1949   2.7  1950   2.9  1951   2.5
   1952    3.1   1953   2.4  1954   2.2  1955   2.9
   1956    2.5   1957   2.6  1958   3.2  1959   3.8
   1960    4.2   1961   3.9  1962   3.7  1963   3.3
   1964    3.7   1965   3.9  1966   4.1  1967   3.8
   1968    4.7   1969   4.4  1970   4.8  1971   4.8
   1972    4.8
   ;
   run;

The variable incidences records the number of melanoma cases per 100,000 people for the years 1936 to 1972.

Four to five degrees of freedom for each nonparametric term in a generalized additive model fits most data well. However, to select DF more objectively you can use the GCV option to minimize the generalized cross validation function, as shown in the following PROC GAM code:

   proc gam data=melanoma;
      model incidences = spline(year) /method = GCV;
      output out=gam p;
   run;

The results are listed in Output 4.2.1 and Output 4.2.2.

Output 4.2.1: Summary Statistics
 
Comparing PROC GAM with PROC TPSPLINE

The GAM Procedure
Dependent Variable: incidences
Smoothing Model Component: spline(year)

Iteration Summary and Fit Statistics
Final number of backfitting iterations 2
Final backfitting criterion 0
Final residual sum of squares 1.2242517494
 
Summary of Input Data Set
Number of Observations 37
Number of Missing Observations 0
Distribution Gaussian
Link Function Identity

Output 4.2.2: Analysis of Model
 
Comparing PROC GAM with PROC TPSPLINE

The GAM Procedure
Dependent Variable: incidences
Smoothing Model Component: spline(year)

Regression Model Analysis
Parameter Estimates
Parameter Parameter
Estimate
Standard
Error
t Value Pr > |t|
Intercept -212.69706 7.00491 -30.36 <.0001
L_year 0.11029 0.00358 30.77 <.0001
 
Smoothing Model Analysis
Fit Statistics of Smoothing Components
Component Smoothing Parameter DF GCV No. of Unique
Obs.
spline(year) 0.634903 13.414936 0.088803 37
 
Smoothing Model Analysis
Analysis of Deviance
Source DF Sum of Squares F Value Pr > F
spline(year) 13.414936 2.736763 50.49 <.0001

Based on the summary of the model, the final model has a DF = 13.414936 and the nonparametric trend is highly significant. Note that this DF is much greater than the default value of 4, indicating that there is a great deal of structure in the yearly incidence rates of melanoma. A prediction plot should reveal the nature of this structure:

   legend1 frame cframe=ligr cborder=black label=none 
           position=center;
   axis1   label=(angle=90 rotate=0);
   axis2   minor=none;
   symbol1 color=red interpol=join value=none line=1;

   proc sort data=gam; by year;
   proc gplot data=gam;
      title;
      plot p_incidences*year = 1 /overlay legend 
      frame cframe=ligr vaxis=axis1 haxis=axis2;
   run;

Output 4.2.3 shows the predicted melanoma rate over time. Two features stand out on this plot:

Output 4.2.3: Predicted Melanoma Incidence Rates By Year
game2d.gif (3210 bytes)

Since PROC TPSPLINE also fits a nonparametric model, PROC TPSPLINE and PROC GAM fits should be very similar for this univariate case. The following code produces the TPSPLINE analysis shown in Output 4.2.4:

   title 'Comparing PROC GAM with PROC GENMOD';
   proc tpspline data=melanoma;
      model incidences = (year);
      output out=tpspline p;
   run;

Output 4.2.4: Analysis from PROC TPSPLINE
 
Comparing PROC GAM with PROC TPSPLINE

The TPSPLINE Procedure
Dependent Variable: incidences

Summary of Input Data Set
Number of Non-Missing Observations 37
Number of Missing Observations 0
Unique Smoothing Design Points 37
 
Summary of Final Model
Number of Regression Variables 0
Number of Smoothing Variables 1
Order of Derivative in the Penalty 2
Dimension of Polynomial Space 2
 
Summary Statistics of Final Estimation
log10(n*Lambda) -0.0607
Smoothing Penalty 0.5171
Residual SS 1.2243
Tr(I-A) 22.5852
Model DF 14.4148
Standard Deviation 0.2328

The TPSPLINE model analysis shows that the DF for the model is 14.4148. This is consistent with the GAM model because the DF value in GAM excludes the degree of freedom of the linear L_year term. The OUTPUT statements in the PROC GAM and PROC TPSPLINE code create gam and tpspline data sets containing the predicted values for the respective procedures. You can use the following code to look at the values for the two procedures side by side:

   data both; merge gam     (rename=(p_incidences=gam     ))
                    tpspline(rename=(p_incidences=tpspline));
   proc print data=both;
      var year gam tpspline;
   run;

The results, the first ten of which are displayed in Output 4.2.5, show that PROC GAM and PROC TPSPLINE give essentially the same predictions for this problem.

Output 4.2.5: Melanoma Predictions for First Ten Years
 
Comparing PROC GAM with PROC TPSPLINE

Obs year gam tpspline
1 1936 0.82425 0.82424
2 1937 0.85580 0.85580
3 1938 0.96379 0.96379
4 1939 1.15046 1.15046
5 1940 1.31044 1.31044
6 1941 1.43881 1.43881
7 1942 1.58218 1.58218
8 1943 1.64382 1.64382
9 1944 1.60148 1.60148
10 1945 1.57498 1.57499

Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
Top
Top

Copyright © 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.