The LOESS Procedure |
Investigators studied the exhaust emissions of a one-cylinder engine (Brinkman; 1981). The SAS data set Gas contains the results data. The dependent variable, NOx, measures the concentration, in micrograms per joule, of nitric oxide and nitrogen dioxide normalized by the amount of work of the engine. The independent variable, E, is a measure of the richness of the air and fuel mixture.
data Gas; input NOx E @@; format NOx f3.1; format E f3.1; datalines; 4.818 0.831 2.849 1.045 3.275 1.021 4.691 0.97 4.255 0.825 5.064 0.891 2.118 0.71 4.602 0.801 2.286 1.074 0.97 1.148 3.965 1 5.344 0.928 3.834 0.767 1.99 0.701 5.199 0.807 5.283 0.902 3.752 0.997 0.537 1.224 1.64 1.089 5.055 0.973 4.937 0.98 1.561 0.665 ;
The following PROC SGPLOT statements produce the simple scatter plot of these data displayed in Output 50.1.1.
proc sgplot data=Gas; scatter x=E y=NOx; run;
The following statements fit two loess models for these data. Because this is a small data set, it is reasonable to do direct fitting at every data point. As there is substantial curvature in the data, quadratic local polynomials are used. An ODS OUTPUT statement creates two output data sets containing the "Output Statistics" and "Fit Summary" tables.
ods graphics on; proc loess data=Gas; ods output OutputStatistics = GasFit FitSummary=Summary; model NOx = E / degree=2 select=AICC(steps) smooth = 0.6 1.0 direct alpha=.01 all details; run; ods graphics off;
Fit Summary | |
---|---|
Fit Method | Direct |
Number of Observations | 22 |
Degree of Local Polynomials | 2 |
Smoothing Parameter | 0.60000 |
Points in Local Neighborhood | 13 |
Residual Sum of Squares | 1.71852 |
Trace[L] | 6.42184 |
GCV | 0.00708 |
AICC | -0.45637 |
AICC1 | -9.39715 |
Delta1 | 15.12582 |
Delta2 | 14.73089 |
Equivalent Number of Parameters | 5.96950 |
Lookup Degrees of Freedom | 15.53133 |
Residual Standard Error | 0.33707 |
The "Fit Summary" table for smoothing parameter value 0.6, shown in Output 50.1.2, records the fitting parameters specified and some overall fit statistics. See the section Smoothing Matrix for a definition of the smoothing matrix , and the sections Model Degrees of Freedom and Statistical Inference and Lookup Degrees of Freedom for definitions of the statistics that appear this table.
The "Output Statistics" table for smoothing parameter value is shown in Output 50.1.3. Note that, because the ALL option is specified in the MODEL statement, this table includes all the relevant optional columns. Furthermore, because the ALPHA=0.01 option is specified in the MODEL statement, the confidence limits in this table are 99% limits.
Output Statistics | ||||||||
---|---|---|---|---|---|---|---|---|
Obs | E | NOx | Predicted NOx | Estimated Prediction Std Deviation |
Residual | t Value | 99% Confidence Limits | |
1 | 0.8 | 4.8 | 4.87377 | 0.15528 | -0.05577 | -0.36 | 4.41841 | 5.32912 |
2 | 1.0 | 2.8 | 2.81984 | 0.15380 | 0.02916 | 0.19 | 2.36883 | 3.27085 |
3 | 1.0 | 3.3 | 3.48153 | 0.15187 | -0.20653 | -1.36 | 3.03617 | 3.92689 |
4 | 1.0 | 4.7 | 4.73249 | 0.13923 | -0.04149 | -0.30 | 4.32419 | 5.14079 |
5 | 0.8 | 4.3 | 4.82305 | 0.15278 | -0.56805 | -3.72 | 4.37503 | 5.27107 |
6 | 0.9 | 5.1 | 5.18561 | 0.19337 | -0.12161 | -0.63 | 4.61855 | 5.75266 |
7 | 0.7 | 2.1 | 2.51120 | 0.15528 | -0.39320 | -2.53 | 2.05585 | 2.96655 |
8 | 0.8 | 4.6 | 4.48267 | 0.15285 | 0.11933 | 0.78 | 4.03444 | 4.93089 |
9 | 1.1 | 2.3 | 2.12619 | 0.16683 | 0.15981 | 0.96 | 1.63697 | 2.61541 |
10 | 1.1 | 1.0 | 0.97120 | 0.18134 | -0.00120 | -0.01 | 0.43942 | 1.50298 |
11 | 1.0 | 4.0 | 4.09987 | 0.13477 | -0.13487 | -1.00 | 3.70467 | 4.49507 |
12 | 0.9 | 5.3 | 5.31258 | 0.17283 | 0.03142 | 0.18 | 4.80576 | 5.81940 |
13 | 0.8 | 3.8 | 3.84572 | 0.14929 | -0.01172 | -0.08 | 3.40794 | 4.28350 |
14 | 0.7 | 2.0 | 2.26578 | 0.16712 | -0.27578 | -1.65 | 1.77571 | 2.75584 |
15 | 0.8 | 5.2 | 4.58394 | 0.15363 | 0.61506 | 4.00 | 4.13342 | 5.03445 |
16 | 0.9 | 5.3 | 5.24741 | 0.19319 | 0.03559 | 0.18 | 4.68089 | 5.81393 |
17 | 1.0 | 3.8 | 4.16979 | 0.13478 | -0.41779 | -3.10 | 3.77457 | 4.56502 |
18 | 1.2 | 0.5 | 0.53059 | 0.32170 | 0.00641 | 0.02 | -0.41278 | 1.47397 |
19 | 1.1 | 1.6 | 1.83157 | 0.17127 | -0.19157 | -1.12 | 1.32933 | 2.33380 |
20 | 1.0 | 5.1 | 4.66733 | 0.13735 | 0.38767 | 2.82 | 4.26456 | 5.07010 |
21 | 1.0 | 4.9 | 4.52385 | 0.13556 | 0.41315 | 3.05 | 4.12632 | 4.92139 |
22 | 0.7 | 1.6 | 1.19888 | 0.26774 | 0.36212 | 1.35 | 0.41375 | 1.98401 |
The combination of the options SELECT=AICC and SMOOTH=0.6 1 in the MODEL statement specifies that PROC LOESS fit models with smoothing parameters of 0.6 and 1 and select the model that yields the smaller value of the AICC statistic. The "Smoothing Criterion" shown in Output 50.1.4 shows that PROC LOESS selects the model with smoothing parameter value 0.6 as it yields the smaller value of the AICC statistic.
With ODS Graphics enabled, PROC LOESS produces a panel of fit plots whenever you specify the SMOOTH= option in the MODEL statement. These fit plots include confidence limits if you additionally specify the CLM option in the MODEL statement.
Output 50.1.5 shows the "Fit Panel" that displays the fitted models with 99% confidence limits overlaid on scatter plots of the data.
Based on the AICC criterion, the model with smoothing parameter is preferred. You can address the question of whether the differences between these models are significant using analysis of variance. You do this by using the model with smoothing parameter value as the null model.
The statistic
has a distribution that is well approximated by an distribution with
numerator degrees of freedom and denominator degrees of freedom (Cleveland and Grosse; 1991). Here quantities with superscript refer to the null model, rss is the residual sum of squares, and , , and are defined in the section Statistical Inference and Lookup Degrees of Freedom.
The "Fit Summary" tables contain the information needed to carry out such an analysis. These tables have been captured in the output data set named Summary by using an ODS OUTPUT statement. The following statements extract the relevant information from this data set and carry out the analysis of variance:
data h0 h1; set Summary(keep=SmoothingParameter Label1 nValue1 where=(Label1 in ('Residual Sum of Squares','Delta1', 'Delta2','Lookup Degrees of Freedom'))); if SmoothingParameter = 1 then output h0; else output h1; run; proc transpose data=h0(drop=SmoothingParameter Label1) out=h0; data h0(drop=_NAME_); set h0; rename Col1 = RSSNull Col2 = delta1Null Col3 = delta2Null; proc transpose data=h1(drop=SmoothingParameter Label1) out=h1; data h1(drop=_NAME_); set h1; rename Col1 = RSS Col2 = delta1 Col3 = delta2 Col4 = rho; data ftest; merge h0 h1; nu = (delta1Null - delta1)**2 / (delta2Null - delta2); Numerator = (RSSNull - RSS)/(delta1Null - delta1); Denominator = RSS/delta1; FValue = Numerator / Denominator; PValue = 1 - ProbF(FValue, nu, rho); label nu = 'Num DF' rho = 'Den DF' FValue = 'F Value' PValue = 'Pr > F'; proc print data=ftest label; var nu rho Numerator Denominator FValue PValue; format nu rho FValue 7.2 PValue 6.4; run;
The results are shown in Output 50.1.6.
The small -value confirms that the fit with smoothing parameter value 0.6 is significantly different from the loess model with smoothing parameter value .
Copyright © 2009 by SAS Institute Inc., Cary, NC, USA. All rights reserved.