Example 9.3 Working with Unequal Subgroup Sample Sizes

[See MACEW4 in the SAS/QC Sample Library]This example contains measurements from the metal clip manufacturing process (introduced in Creating EWMA Charts from Raw Data). The following statements create a SAS data set named Clips4, which contains additional clip gap measurements taken on a daily basis:


data Clips4;
   input Day @;
   length Dayc $2.;
   informat Day ddmmyy8.;
   format   Day date5.;
   Dayc=put(Day,date5.);
   Dayc=substr(Dayc,1,2);
   do i=1 to 5;
      input Gap @;
      output;
      end;
   drop i;
   label Dayc='April';
datalines;
 1/4/86  14.93  14.65  14.87  15.11  15.18
 2/4/86  15.06  14.95  14.91  15.14  15.41
 3/4/86  14.90  14.90  14.96  15.26  15.18
 4/4/86  15.25  14.57  15.33  15.38  14.89
 7/4/86  14.68  14.63  14.72  15.32  14.86
 8/4/86  14.48  14.88  14.98  14.74  15.48
 9/4/86  14.99  15.16  15.02  15.53  14.66
10/4/86  14.88  15.44  15.04  15.10  14.89
11/4/86  15.14  15.33  14.75  15.23  14.64
14/4/86  15.46  15.30  14.92  14.58  14.68
15/4/86  15.23  14.63    .      .      .
16/4/86  15.13  15.25    .      .      .
17/4/86  15.06  15.25  15.28  15.30  15.34
18/4/86  15.22  14.77  15.12  14.82  15.29
21/4/86  14.95  14.96  14.65  14.87  14.77
22/4/86  15.01  15.11  15.11  14.79  14.88
23/4/86  14.97  15.50  14.93  15.13  15.25
24/4/86  15.23  15.21  15.31  15.07  14.97
25/4/86  15.08  14.75  14.93  15.34  14.98
28/4/86  15.07  14.86  15.42  15.47  15.24
29/4/86  15.27  15.20  14.85  15.62  14.67
30/4/86  14.97  14.73  15.09  14.98  14.46
;

Note that only two gap measurements were recorded on April 15 and April 16.

A partial listing of Clips4 is shown in Output 9.3.1. This data set contains three variables: Day is a numeric variable that contains the date (month, day, and year) that the measurement is taken, Dayc is a character variable that contains the day the measurement is taken, and Gap is a numeric variable that contains the measurement.

Output 9.3.1 The Data Set Clips4
The Data Set Clips4

Day Dayc Gap
01APR 01 14.93
01APR 01 14.65
01APR 01 14.87
01APR 01 15.11
01APR 01 15.18
02APR 02 15.06
02APR 02 14.95
02APR 02 14.91
02APR 02 15.14
02APR 02 15.41
03APR 03 14.90
03APR 03 14.90
03APR 03 14.96
03APR 03 15.26
03APR 03 15.18

The following statements request an EWMA chart, shown in Output 9.3.2, for these gap measurements:

ods graphics off;
symbol v=dot h=1.5;
title 'EWMA Chart for Gap Measurements';
proc macontrol data=Clips4;
   ewmachart Gap*Dayc / weight = 0.3;
run;

The character variable Dayc (rather than the numeric variable Day) is specified as the subgroup-variable in the preceding EWMACHART statement. If Day were the subgroup-variable, each day during April would appear on the horizontal axis, including the weekend days of April 5 and April 6 for which no measurements were taken. To avoid this problem, the subgroup-variable Dayc is created from Day using the PUT and SUBSTR function. Since Dayc is a character subgroup-variable, a discrete axis is used for the horizontal axis, and as a result, April 5 and April 6 do not appear on the horizontal axis in Output 9.3.2. A LABEL statement is used to specify the label April for the horizontal axis, indicating the month that these measurements were taken.

Output 9.3.2 EWMA Chart with Varying Sample Sizes
EWMA Chart with Varying Sample Sizes

Note that the control limits vary with the subgroup sample size. The sample size legend in the lower left corner displays the minimum and maximum subgroup sample sizes.

The EWMACHART statement provides various options for working with unequal subgroup sample sizes. For example, you can use the LIMITN= option to specify a fixed (nominal) sample size for computing control limits, as illustrated by the following statements:

title 'EWMA Chart for Gap Measurements';
proc macontrol data=Clips4;
   ewmachart Gap*Dayc / weight = 0.3
                        limitn = 5;
run;

The resulting chart is shown in Output 9.3.3.

Output 9.3.3 Control Limits Based on Fixed Sample Size
Control Limits Based on Fixed Sample Size

Note that the only points displayed are those corresponding to subgroups whose sample size matches the nominal sample size of five. Therefore, points are not displayed for April 15 and April 16. To plot points for all subgroups (regardless of subgroup sample size), you can specify the ALLN option, as follows:

title 'EWMA Chart for Gap Measurements';
proc macontrol data=Clips4;
   ewmachart Gap*Dayc/ weight = 0.3
                       limitn = 5
                       alln
                       nmarkers;
run;

The chart is shown in Output 9.3.4. The NMARKERS option requests special symbols to identify points for which the subgroup sample size differs from the nominal sample size.

Output 9.3.4 Displaying All Subgroups Regardless of Sample Size
Displaying All Subgroups Regardless of Sample Size

You can use the SMETHOD= option to determine how the process standard deviation is to be estimated when the subgroup sample sizes vary. The default method computes as an unweighted average of subgroup estimates of . Specifying SMETHOD=MVLUE requests a minimum variance linear unbiased estimate (MVLUE), which assigns greater weight to estimates of from subgroups with larger sample sizes. Specifying SMETHOD=RMSDF requests a weighted root-mean-square estimate. If the unknown standard deviation is constant across subgroups, the root-mean-square estimate is more efficient than the MVLUE. For more information, see Methods for Estimating the Standard Deviation.

The following statements apply all three methods:

proc macontrol data=Clips4;
   ewmachart Gap*Dayc / outlimits = Cliplim1
                       outindex  = 'Default'
                       weight    = 0.3
                       nochart;
   ewmachart Gap*Dayc / smethod   = mvlue
                       outlimits = Cliplim2
                       outindex  = 'MVLUE'
                       weight    = 0.3
                       nochart;
   ewmachart Gap*Dayc / smethod   = rmsdf
                       outlimits = Cliplim3
                       outindex  = 'RMSDF'
                       weight    = 0.3
                       nochart;

run;

data Climits;
   set Cliplim1 Cliplim2 Cliplim3;
run;

The data set Climits is listed in Output 9.3.5.

Output 9.3.5 Listing of the Data Set Climits
Estimating the Process Standard Deviation

_VAR_ _SUBGRP_ _INDEX_ _TYPE_ _LIMITN_ _ALPHA_ _SIGMAS_ _MEAN_ _STDDEV_ _WEIGHT_
Gap Dayc Default ESTIMATE V .002699796 3 15.0354 0.26503 0.3
Gap Dayc MVLUE ESTIMATE V .002699796 3 15.0354 0.26096 0.3
Gap Dayc RMSDF ESTIMATE V .002699796 3 15.0354 0.25959 0.3

Note that the estimate of the process standard deviation (stored in the variable _STDDEV_) is slightly different depending on the estimation method. The variable _LIMITN_ is assigned the special missing value V in the OUTLIMITS= data set, indicating that the subgroup sample sizes vary.