BOXCHART Statement: SHEWHART Procedure

Input Data Sets

DATA= Data Set

You can read raw data (process measurements) from a DATA= data set specified in the PROC SHEWHART statement. Each process specified in the BOXCHART statement must be a SAS variable in the data set. This variable provides measurements which must be grouped into subgroup samples indexed by the subgroup-variable. The subgroup-variable, specified in the BOXCHART statement, must also be a SAS variable in the DATA= data set. Each observation in a DATA= data set must contain a value for each process and a value for the subgroup-variable. If the $\text{[math]}$ th subgroup contains $\text{[math]}$ measurements, there should be $\text{[math]}$ consecutive observations for which the value of the subgroup-variable is the index of the $\text{[math]}$ th subgroup. For example, if each subgroup contains 20 items and there are 30 subgroup samples, the DATA= data set should contain 600 observations. Other variables that can be read from a DATA= data set include

_PHASE_ (if READPHASES= is specified)
block-variables
symbol-variable
BY variables
ID variables

By default, the SHEWHART procedure reads all of the observations in a DATA= data set. However, if the data set includes the variable _PHASE_, you can read selected groups of observations (referred to as phases) with the READPHASES= option for an example, see Displaying Stratification in Phases.

For an example of a DATA= data set, see Creating Box Charts from Raw Data.

LIMITS= Data Set

You can read preestablished control limits (or parameters from which the control limits can be calculated) from a LIMITS= data set specified in the PROC SHEWHART statement. For example, the following statements read control limit information from the data set Conlims:¹

proc shewhart data=Info limits=Conlims;
   boxchart Weight*Batch;
run;

The LIMITS= data set can be an OUTLIMITS= data set that was created in a previous run of the SHEWHART procedure. Such data sets always contain the variables required for a LIMITS= data set; see Table 15.7. The LIMITS= data set can also be created directly using a DATA step. When you create a LIMITS= data set, you must provide one of the following:

the variables _LCLX_, _MEAN_, and _UCLX_ or (if you specify CONTROLSTAT=MEDIAN) the variables _LCLM_, _MEAN_, and _UCLM_. These variables specify the control limits directly.
the variables _MEAN_ and _STDDEV_, which are used to calculate the control limits according to the equations in Table 15.5 and Table 15.6

In addition, note the following:

The variables _VAR_ and _SUBGRP_ are required. These must be character variables whose lengths are no greater than 32.
The variable _INDEX_ is required if you specify the READINDEX= option; this must be a character variable whose length is no greater than 48.
The variables _LIMITN_, _SIGMAS_ (or _ALPHA_), and _TYPE_ are optional, but they are recommended to maintain a complete set of control limit information. The variable _TYPE_ must be a character variable of length 8; valid values are 'ESTIMATE', 'STANDARD', 'STDMU', and 'STDSIGMA'.
BY variables are required if specified with a BY statement.

For an example, see Reading Preestablished Control Limits.

HISTORY= Data Set

You can read subgroup summary statistics from a HISTORY= data set specified in the PROC SHEWHART statement. This enables you to reuse OUTHISTORY= data sets that have been created in previous runs of the SHEWHART, CUSUM, or MACONTROL procedures or to read output data sets created with SAS summarization procedures, such as PROC UNIVARIATE.

A HISTORY= data set used with the BOXCHART statement must contain the following:

the subgroup-variable
a subgroup minimum variable for each process
a subgroup first-quartile variable for each process
a subgroup median variable for each process
a subgroup mean variable for each process
a subgroup third-quartile variable for each process
a subgroup maximum variable for each process
a subgroup sample size variable for each process
either a subgroup range variable or a subgroup standard deviation variable for each process

If you specify the RANGES option, the subgroup range variable must be included; otherwise, the subgroup standard deviation variable must be included.

The names of the subgroup summary statistics variables must be the process name concatenated with the following special suffix characters:

Subgroup Summary Statistic	Suffix Character
subgroup minimum	L
subgroup first-quartile	1
subgroup median	M
subgroup mean	X
subgroup third-quartile	3
subgroup maximum	H
subgroup sample size	N
subgroup range	R
subgroup standard deviation	S

For example, consider the following statements:

proc shewhart history=summary;
   boxchart (weight Yieldstrength)*batch;
run;

The data set Summary must include the variables Batch, WeightL, Weight1, WeightM, WeightX, Weight3, WeightH, WeightS, WeightN, YieldstrengthL, Yieldstrength1, YieldstrengthM, YieldstrengthX, Yieldstrength3, YieldstrengthH, YieldstrengthS, and YieldstrengthN.

If the RANGES option were specified in the preceding BOXCHART statement, it would be necessary for Summary to include the variables WeightR and YieldstrengthR rather than WeightS and YieldstrengthS.

Note that if you specify a process name that contains 32 characters, the names of the summary variables must be formed from the first 16 characters and the last 15 characters of the process name, suffixed with the appropriate character.

Other variables that can be read from a HISTORY= data set include

_PHASE_ (if READPHASES= is specified)
block-variables
symbol-variable
BY variables
ID variables

By default, the SHEWHART procedure reads all of the observations in a HISTORY= data set. However, if the data set includes the variable _PHASE_, you can read selected groups of observations (referred to as phases) with the READPHASES= option (see Displaying Stratification in Phases for an example).

For an example of a HISTORY= data set, see Creating Box Charts from Subgroup Summary Data.

TABLE= Data Set

You can read summary statistics and control limits from a TABLE= data set specified in the PROC SHEWHART statement. This enables you to reuse an OUTTABLE= data set created in a previous run of the SHEWHART procedure. Because the SHEWHART procedure simply displays the information in a TABLE= data set, you can use TABLE= data sets to create specialized control charts. Examples are provided in Specialized Control Charts: SHEWHART Procedure.

The following table lists the variables required in a TABLE= data set used with the BOXCHART statement:

Table 15.11 Variables Required in a TABLE= Data Set
Variable	Description
_LCLM_	lower control limit for median
_LCLX_	lower control limit for mean
_LIMITN_	nominal sample size associated with the control limits
_MEAN_	process mean
subgroup-variable	values of the subgroup-variable
_SUBMAX_	subgroup maximum
_SUBMIN_	subgroup minimum
_SUBMED_	subgroup median
_SUBN_	subgroup sample size
_SUBQ1_	subgroup first quartile ( $\text{[math]}$ th percentile)
_SUBQ3_	subgroup third quartile ( $\text{[math]}$ th percentile)
_SUBX_	subgroup mean
_UCLM_	upper control limit for median
_UCLX_	upper control limit for mean

Note that if you specify CONTROLSTAT=MEDIAN, the variables _LCLM_, _SUBMED_, and _UCLM_ are required; otherwise, the variables _LCLX_, _SUBX_, and _UCLX_ are required.

Other variables that can be read from a TABLE= data set include

block-variables
symbol-variable
BY variables
ID variables
_PHASE_ (if the READPHASES= option is specified). This variable must be a character variable whose length is no greater than 48.
_TESTS_ (if the TESTS= option is specified). This variable is used to flag tests for special causes and must be a character variable of length 8.
_VAR_. This variable is required if more than one process is specified or if the data set contains information for more than one process. This variable must be a character variable whose length is no greater than 32.

For an example of a TABLE= data set, see Saving Control Limits.

BOX= Data Set

You can read summary statistics, control limits, and outlier values from a BOX= data set specified in the PROC SHEWHART statement. This enables you to reuse an OUTBOX= data set created in a previous run of the SHEWHART procedure to display a box chart.

A BOX= data set must contain the following variables:

the group variable
_VAR_, containing the process variable name
_TYPE_, identifying features of box-and-whisker plots
_VALUE_, containing values of those features

Each observation in a BOX= data set records the value of a single feature of one subgroup’s box-and-whisker plot, such as its mean. The _TYPE_ variable identifies the feature whose value is recorded in a given observation. The following table lists valid _TYPE_ variable values:

Table 15.12 Valid _TYPE_ Values in a BOX= Data Set
Value	Description
N	subgroup size
SIGMAS	multiple ( $\text{[math]}$ ) of standard error of $\text{[math]}$ or $\text{[math]}$
ALPHA	probability ( $\text{[math]}$ ) of exceeding limits
LIMITN	nominal sample size associated with control limits
LCLM	lower control limit for subgroup median
LCLX	lower control limit for subgroup mean
UCLM	upper control limit for subgroup median
UCLX	upper control limit for subgroup mean
PROCMED	process median
PROCMEAN	process mean
EXLIM	control limit exceeded on box chart
TREND	trend variable value
MIN	minimum subgroup value
Q1	subgroup first quartile
MEDIAN	subgroup median
MEAN	subgroup mean
Q3	subgroup third quartile
MAX	subgroup maximum value
LOW	low outlier value
HIGH	high outlier value
LOWHISKR	low whisker value, if different from MIN
HIWHISKR	high whisker value, if different from MAX
FARLOW	low far outlier value
FARHIGH	high far outlier value

The features identified by the _TYPE_ values N, LCLM or LCLX, UCLM or UCLX, PROCMED or PROCMEAN, MIN, Q1, MEDIAN, MEAN, Q3, and MAX are required for each subgroup.

Other variables that can be read from a BOX= data set include:

the variable _ID_, containing labels for outliers
the variable _HTML_, containing links to be associated with features on box plots
block-variables
symbol-variable
BY variables
ID variables

When you specify one of the keywords SCHEMATICID or SCHEMATICIDFAR with the BOXSTYLE= option, values of _ID_ are used as outlier labels. If _ID_ does not exist in the BOX= data set, the values of the first variable listed in the ID statement are used.

Footnotes

In SAS 6.09 and in earlier releases, it is necessary to specify the READLIMITS option.