The LOESS Procedure

Statistical Inference and Lookup Degrees of Freedom

If you denote the ith measurement of the response by and the corresponding measurement of predictors by , then

$y_ i=g(x_ i) + \epsilon _ i$

where g is the regression function and $\epsilon _ i$ are independent random errors with mean zero. If the errors are normally distributed with constant variance, then you can obtain confidence intervals for the predictions from PROC LOESS. You can also obtain confidence limits in the case where $\epsilon _ i$ is heteroscedastic but $a_ i \epsilon _ i$ has constant variance and are a priori weights that are specified using the WEIGHT statement of PROC LOESS. You can do inference in the case in which the error distribution is symmetric by using iterative reweighting. Formulas for doing statistical inference under the preceding conditions can be found in Cleveland and Grosse (1991) and Cleveland, Grosse, and Shyu (1992). Cleveland and Grosse (1991) show that standardized residuals for a loess model follow a t distribution with $\rho$ degrees of freedom where

$\displaystyle \delta _1$	$\displaystyle \equiv$	$\displaystyle \mbox{Trace} (\bI -\bL )^\prime (\bI -\bL )$
$\displaystyle \delta _2$	$\displaystyle \equiv$	$\displaystyle \mbox{Trace} \left((\bI -\bL )^\prime (\bI -\bL )\right)^2$
$\displaystyle \rho$	$\displaystyle \equiv$	$\displaystyle \mbox{Lookup Degrees of Freedom}$
$\displaystyle$	$\displaystyle \equiv$	$\displaystyle \delta _1^2/ \delta _2$

The residual standard error that you find in the “Fit Summary” table is defined by

$\mbox{Residual Standard Error} \equiv \sqrt { \mbox{Residual SS} / \delta _1 }$

The determination of $\rho$ is computationally expensive and is not done by default. It is computed if you specify the DFMETHOD=EXACT or DFMETHOD=APPROX option in the MODEL statement. It is also computed if you specify any of the options CLM, STD, and T in the MODEL statement. Note that the values of $\delta _1$ , $\delta _2$ , and $\rho$ are reported in the “Fit Summary” table.

If you specify the CLM option in the MODEL statement, confidence limits are added to the OutputStatistics table. By default, 95% limits are computed, but you can change this by using the ALPHA= option in the MODEL statement.