XCHART Statement: CUSUM Procedure

Methods for Estimating the Standard Deviation

It is recommended practice to provide a stable estimate or standard value for $\sigma$ with either the SIGMA0= option or the variable _STDDEV_ in a LIMITS= data set. However, if such a value is not available, you can compute an estimate $\hat{\sigma }$ from the data, as described in this section.

This section provides formulas for various methods used to estimate the standard deviation $\sigma$ . One method is applicable with individual measurements, and three are applicable with subgrouped data. The methods can be requested with the SMETHOD= option.

Method for Individual Measurements

When the cumulative sums are calculated from individual observations

$x_{1},x_{2},\ldots ,x_{N}$

rather than subgroup samples of two or more observations, the CUSUM procedure estimates $\sigma$ as $\sqrt {\hat{\sigma }^{2}}$ , where

$\hat{\sigma }^{2}=\frac{1}{2(N-1)} \sum _{i=1}^{N-1}{(x_{i+1}-x_{i})^{2}}$

where N is the number of observations. Wetherill (1977) states that the estimate of the variance is biased if the measurements are autocorrelated.

Note that you can compute alternative estimates (for instance, robust estimates or estimates based on variance components models) by analyzing the data with SAS modeling procedures or your own DATA step program. Such estimates can be passed to the CUSUM procedure as values of the variable _STDDEV_ in a LIMITS= data set.

NOWEIGHT Method for Subgroup Samples

This method is the default for cusum charts for subgrouped data. The estimate is

$\hat{\sigma }=\frac{(s_{1}/c_{4}(n_{1}))+\cdots + (s_{N}/c_{4}(n_{N}))}{N}$

where $n_{i}$ is the sample size of the ith subgroup, N is the number of subgroups for which $n_{i}\geq 2$ , $s_{i}$ is the sample standard deviation of the observations $x_{i1},\ldots ,x_{in_{i}}$ in the ith subgroup.

$s_{i}=\sqrt {(1/(n_{i}-1))\textstyle \sum _{j=1}^{n_{i}}{(x_{ij}-\bar{X}_{i})^{2} } }$

and

$c_{4}(n_{i})=\frac{\Gamma (n_{i}/2)\sqrt {2/(n_{i}-1)} }{\Gamma ((n_{i}-1)/2)}$

where $\Gamma (\cdot )$ denotes the gamma function, and $\bar{X}_{i}$ denotes the ith subgroup mean. A subgroup standard deviation $s_{i}$ is included in the calculation only if $n_{i}\geq 2$ . If the observations are normally distributed, then the expected value of $s_{i}$ is

$\mbox{E}(s_{i})=c_{4}(n_{i})\sigma$

Thus, $\hat{\sigma }$ is the unweighted average of N unbiased estimates of $\sigma$ . This method is described in the ASTM Manual on Presentation of Data and Control Chart Analysis.

MVLUE Method for Subgroup Samples

If you specify SMETHOD=MVLUE, a minimum variance linear unbiased estimate (MVLUE) is computed, as introduced by Burr (1969, 1976). This estimate is a weighted average of unbiased estimates of $\sigma$ of the form

$s_{i}/c_{4}(n_{i})$

where

$s_{i}$	is the standard deviation of the ith subgroup.
$c_{4}(n_{i})$	is the unbiasing factor defined previously.
$n_{i}$	is the ith subgroup sample size, $i=1,2,\ldots ,N$ .
N	is the number of subgroups for which $n_{i}\geq 2$ .

The estimate is

$\hat{\sigma }=\frac{h_{1}s_{1}/c_{4}(n_{1})+\cdots + h_{N}s_{N}/c_{4}(n_{N})}{h_{1}+\cdots +h_{N}}$

where $h_{i}=c^{2}_{4}(n_{i})/(1-c^{2}_{4}(n_{i}))\,$ . A subgroup standard deviation $s_{i}$ is included in the calculation only if $n_{i}\geq 2$ .

The MVLUE assigns greater weight to estimates of $\sigma$ from subgroups with larger sample sizes and is intended for situations where the subgroup sample sizes vary. If the subgroup sample sizes are constant, the MVLUE reduces to the default estimate (NOWEIGHT).

RMSDF Method for Subgroup Samples

If you specify SMETHOD=RMSDF, a weighted root-mean-square estimate is computed:

$\hat{\sigma }=\frac{ \sqrt {(n_{1}-1)s^{2}_{1}+\cdots +(n_{N}-1)s^{2}_{N}} }{c_{4}(n)\sqrt {n_{1}+\cdots +n_{N}-N} }$

where

$n_{i}$	is the sample size of the ith subgroup.
N	is the number of subgroups for which $n_{i}\geq 2$ .
$s_{i}$	is the sample standard deviation of the ith subgroup.
$c_{4}(n_{i})$	is the unbiasing factor defined previously.
n	is equal to $(n_{1}+\cdots +n_{N})-(N-1)\, .$

The weights in the root-mean-square expression are the degrees of freedom $n_{i}-1$ . A subgroup standard deviation $s_{i}$ is included in the calculation only if $n_{i}\geq 2$ .

If the unknown standard deviation $\sigma$ is constant across subgroups, the root-mean-square estimate is more efficient than the minimum variance linear unbiased estimate. However, as noted by Burr (1969), "the constancy of $\sigma$ is the very thing under test," and if $\sigma$ varies across subgroups, the root-mean-square estimate tends to be more inflated than the MVLUE.