|
Chapter Contents |
Previous |
Next |
| The GAM Procedure |
The following example illustrates the use of the GAM procedure to explore in a nonparametrical way how two factors affect a response. The data come from a study (Sockett et al. 1987) of the factors affecting patterns of insulin-dependent diabetes mellitus in children. The objective is to investigate the dependence of the level of serum C-peptide on various other factors in order to understand the patterns of residual insulin secretion. The response measurement is the logarithm of C-peptide concentration (pmol/ml) at diagnosis, and the predictor measurements are age and base deficit (a measure of acidity):
title 'Patterns of Diabetes';
data diabetes;
input Age BaseDeficit CPeptide @@;
logCP = log(CPeptide);
datalines;
5.2 -8.1 4.8 8.8 -16.1 4.1 10.5 -0.9 5.2
10.6 -7.8 5.5 10.4 -29.0 5.0 1.8 -19.2 3.4
12.7 -18.9 3.4 15.6 -10.6 4.9 5.8 -2.8 5.6
1.9 -25.0 3.7 2.2 -3.1 3.9 4.8 -7.8 4.5
7.9 -13.9 4.8 5.2 -4.5 4.9 0.9 -11.6 3.0
11.8 -2.1 4.6 7.9 -2.0 4.8 11.5 -9.0 5.5
10.6 -11.2 4.5 8.5 -0.2 5.3 11.1 -6.1 4.7
12.8 -1.0 6.6 11.3 -3.6 5.1 1.0 -8.2 3.9
14.5 -0.5 5.7 11.9 -2.0 5.1 8.1 -1.6 5.2
13.8 -11.9 3.7 15.5 -0.7 4.9 9.8 -1.2 4.8
11.0 -14.3 4.4 12.4 -0.8 5.2 11.1 -16.8 5.1
5.1 -5.1 4.6 4.8 -9.5 3.9 4.2 -17.0 5.1
6.9 -3.3 5.1 13.2 -0.7 6.0 9.9 -3.3 4.9
12.5 -13.6 4.1 13.2 -1.9 4.6 8.9 -10.0 4.9
10.8 -13.5 5.1
;
run;
The following statements perform the desired analysis. The PROC GAM statement invokes the procedure and specifies the diabetes data set as input. The MODEL statement specifies logCP as the response variable and requests that univariate BSPLINEs with 4 degrees of freedom be used to model the effect of Age and BaseDeficit. The OUTPUT statement specifies that partial prediction curves are to be saved in the data set estimates:
title 'Patterns of Diabetes';
proc gam data=diabetes;
model logCP = spline(age) spline(BaseDeficit);
output out=estimates p;
run;
The results are shown in Figure 4.1 and Figure 4.2.
Figure 4.1 shows two tables. The first table summarizes the convergence criterion for back-fitting, and the second one summarizes the input data set and the distribution family used for the model.
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Figure 4.2 displays summary statistics for the model. It consists of three tables. The first is the Parameter Estimates table for the parametric part of the model. It indicates that the linear trends for both Age and BaseDeficit are highly significant with p-values of 0.0024 and 0.0020. The second table is the summary of smoothing components of the nonparametric part of the model. Since the GAM fit used the default DF = 4, the main point of this table is to present the smoothing parameter values that yield this DF for each component. Finally, the third table shows the Analysis of Deviance table for the nonparametric component of the model.
The P option in the OUTPUT statement puts the partial predictions for Age and BaseDeficit in the output data set. You can compute the entire partial prediction effect for each factor by adding the estimated linear terms to the respective partial predictions, as in the following statement:
data estimates; set estimates;
P2_age = P_age + 0.01437*age;
P2_BaseDeficit = P_BaseDeficit + 0.00807*BaseDeficit;
run;
Plotting the partial predictions is one way to explore the overall shape of the relationship between each factor and the response. First of all, the following statements set up the graphics options:
axis1 label=(angle=90 rotate=0) minor=none; axis2 minor=none order=(0 to 16 by 4); symbol1 color=red interpol=join value=none line=1; symbol2 color=blue interpol=join value=none line=2;
Then the following statements plot the partial prediction curves for Age and BaseDeficit:
proc sort data=estimates;
by age;
title;
proc gplot data=estimates;
plot P_age *age = 1
P2_Age*Age = 2 /overlay legend frame cframe=ligr
name='gam1' vaxis=axis1 haxis=axis2;
run;
proc sort data=estimates;
by BaseDeficit;
axis2 minor=none;
proc gplot data=estimates;
plot P_BaseDeficit *BaseDeficit = 1
P2_BaseDeficit*BaseDeficit = 2 /
overlay legend frame cframe=ligr name='gam2'
vaxis=axis1 haxis=axis2;
run;
Finally, the following statements redisplay the curves side by side for easy comparison:
goptions display;
proc greplay tc=tempcat nofs;
igout gseg;
tdef newtwo des='two plots of equal size'
1/llx=0 lly=0
ulx=0 uly=100
urx=50 ury=100
lrx=50 lry=0
2/llx=50 lly=0
ulx=50 uly=100
urx=100 ury=100
lrx=100 lry=0
;
template newtwo;
treplay 1:gam1
2:gam2;
run; quit;
The resulting plots for each predictor with and without the linear term are shown in Figure 4.3.
|
Both plots show a strong quadratic pattern, with a possible indication of higher-order behavior. Further investigation is required to determine whether these patterns are real or not.
|
Chapter Contents |
Previous |
Next |
Top |
Copyright © 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.