Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
The LOESS Procedure

Example 6.1: Automatic Smoothing Parameter Selection

The following data set contains measurements of monthly averaged atmospheric pressure differences between Easter Island and Darwin, Australia, for a period of 168 months (NIST 1998):

   data ENSO;
     input Pressure @@;
     Month=_N_; 
     format Pressure 4.1;
     format Month 3.0;
   datalines;
   12.9  11.3  10.6  11.2  10.9   7.5   7.7  11.7
   12.9  14.3  10.9  13.7  17.1  14.0  15.3   8.5 
    5.7   5.5   7.6   8.6   7.3   7.6  12.7  11.0
   12.7  12.9  13.0  10.9  10.4  10.2   8.0  10.9
   13.6  10.5   9.2  12.4  12.7  13.3  10.1   7.8
    4.8   3.0   2.5   6.3   9.7  11.6   8.6  12.4
   10.5  13.3  10.4   8.1   3.7  10.7   5.1  10.4
   10.9  11.7  11.4  13.7  14.1  14.0  12.5   6.3
    9.6  11.7   5.0  10.8  12.7  10.8  11.8  12.6
   15.7  12.6  14.8   7.8   7.1  11.2   8.1   6.4 
    5.2  12.0  10.2  12.7  10.2  14.7  12.2   7.1
    5.7   6.7   3.9   8.5   8.3  10.8  16.7  12.6
   12.5  12.5   9.8   7.2   4.1  10.6  10.1  10.1   
   11.9  13.6  16.3  17.6  15.5  16.0  15.2  11.2
   14.3  14.5   8.5  12.0  12.7  11.3  14.5  15.1
   10.4  11.5  13.4   7.5   0.6   0.3   5.5   5.0
    4.6   8.2   9.9   9.2  12.5  10.9   9.9   8.9
    7.6   9.5   8.4  10.7  13.6  13.7  13.7  16.5  
   16.8  17.1  15.4   9.5   6.1  10.1   9.3   5.3
   11.2  16.6  15.6  12.0  11.5   8.6  13.8   8.7 
    8.6   8.6   8.7  12.8  13.2  14.0  13.4  14.8
   ;

The following PROC GPLOT statements produce the simple scatter plot of these data, displayed in Output 6.1.1:

 
   symbol1 color=black value=dot ;  
   proc gplot data=ENSO;
      plot Pressure*Month /  
           hminor = 0
           vminor = 0
           vaxis  = axis1
           frame cframe=ligr;
           axis1 label = ( r=0 a=90 ) order=(0 to 20  by 4);
   run;

Output 6.1.1: Scatter Plot of ENSO Data
lwse4a.gif (4983 bytes)

You can compute a loess fit and plot the results for these data using the following statements:

   ods output OutputStatistics=ENSOstats;

   proc loess data=ENSO;
      model Pressure=Month ;
   run;

   symbol1 color=black value=dot h=2.5 pct;  
   symbol2 color=black interpol=join value=none width=2;
   proc gplot data=ENSOstats;
      plot (depvar pred)*Month / overlay 
           hminor = 0
           vminor = 0
           vaxis  = axis1
           frame cframe=ligr;
           axis1 label = ( r=0 a=90 ) order=(0 to 20  by 4);
   run; quit;

The "Smoothing Criterion" and "Fit Summary" tables are shown in Output 6.1.2 and the fit is plotted in Figure 6.1.3.

Output 6.1.2: Output from PROC LOESS
 
The LOESS Procedure
Dependent Variable: Pressure

Optimal Smoothing Criterion
AICC Smoothing
Parameter
3.41105 0.22321

 


 

The LOESS Procedure
Selected Smoothing Parameter: 0.223
Dependent Variable: Pressure

Fit Summary
Fit Method kd Tree
Blending Linear
Number of Observations 168
Number of Fitting Points 33
kd Tree Bucket Size 7
Degree of Local Polynomials 1
Smoothing Parameter 0.22321
Points in Local Neighborhood 37
Residual Sum of Squares 1654.27725
Trace[L] 8.74180
GCV 0.06522
AICC 3.41105

Output 6.1.3: Oversmoothed Loess Fit for the ENSO Data
lwse4b1.gif (4773 bytes)

The smoothing parameter value used for the loess fit shown in Figure 6.1.3 was chosen using the default method of PROC LOESS, namely a golden section minimization of the AICC criterion over the interval (0,1]. The fit seems to be oversmoothed. What accounts for this poor fit?

One possibility is that the golden section search has found a local rather than a global minimum of the AICC criterion. You can test this by redoing the fit requesting a global minimum. It is also helpful to plot the AICC criterion as a function of the smoothing parameter value used. You do this with the following statements:

   ods output ModelSummary=ENSOsummary;

   proc loess data=ENSO;
      model Pressure=Month/select=AICC(global);
   run;

   proc sort data=ENSOsummary;
       by smooth;
   run;

   symbol1 color=black interpol=join value=none width=2;
   proc gplot data=ENSOsummary;
      format AICC f4.1;
      format smooth f4.1;
      plot AICC*Smooth / 
           hminor = 0 vminor = 0
           vaxis  = axis1 frame cframe=ligr;
           axis1 label = ( r=0 a=90 );
   run; quit;

The results are shown in Figure 6.1.4.

Output 6.1.4: AICC versus Smoothing Parameter Showing Local Minima
lwse4c.gif (3836 bytes)

The explanation for the oversmoothed fit in Figure 6.1.3 is now apparent. The golden section search algorithm found the local minimum that occurs near the value 0.22 of the smoothing parameter rather than the global minimum that occurs near 0.06. Note that if you restrict the range of smoothing parameter values examined to lie below 0.2, then the golden section search finds the global minimum as the following statements demonstrate:

   ods output OutputStatistics=ENSOstats;

   proc loess data=ENSO;
      model Pressure=Month/select=AICC( range(0.03,0.2) );
   run;

   symbol1 color=black value=dot h=2.5 pct;  
   symbol2 color=black interpol=join value=none width=2;
   proc gplot data=ENSOstats;
      plot (depvar pred)*Month / overlay 
           hminor = 0
           vminor = 0
           vaxis  = axis1
           frame cframe=ligr;
           axis1 label = ( r=0 a=90 ) order=(0 to 20  by 4);
   run; quit;

The fit obtained is shown in Figure 6.1.5.

Output 6.1.5: Loess Fit for the ENSO Data
lwse4d.gif (5978 bytes)

The loess fit shown in Figure 6.1.5 clearly shows an annual cycle in the data. An interesting question is whether there is some phenomenon captured in the data that would explain the presence of the local minimum near 0.22 in the AICC curve. Note that there is some evidence of a cycle of about 42 months in the oversmoothed fit in Figure 6.1.3. You can see this cycle because the strong annual cycle in Figure 6.1.5 has been smoothed out. The physical phenomenon that accounts for the existence of this cycle has been identified as the periodic warming of the Pacific Ocean known as "El Niño."

Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
Top
Top

Copyright © 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.