SAS Institute. The Power to Know

SAS/QC(R) 9.2 User's Guide


BOXCHART Statement

Input Data Sets

BOX= Data Set

You can read summary statistics, decision limits, and outlier values from a BOX= data set specified in the PROC ANOM statement. This enables you to reuse an OUTBOX= data set created in a previous run of the ANOM procedure to display a box chart.

A BOX= data set must contain the following variables:

  • the group variable
  • _VAR_, containing the analysis variable name
  • _TYPE_, identifying features of box-and-whisker plots
  • _VALUE_, containing values of those features

Each observation in a BOX= data set records the value of a single feature of one group's box-and-whisker plot, such as its mean. The _TYPE_ variable identifies the feature whose value is recorded in a given observation. The following table lists valid _TYPE_ variable values:

Table 5.23: Valid _TYPE_ Values in a BOX= Data Set
_TYPE_ Value Description
Ngroup size
ALPHAsignificance level
LIMITNnominal sample size associated with decision limits
LDLXlower decision limit for group mean
UDLXupper decision limit for group mean
RESPMEANoverall response variable mean
MINgroup minimum value
Q1group first quartile
MEDIANgroup median
MEANgroup mean
Q3group third quartile
MAXgroup maximum value
LOWlow outlier value
HIGHhigh outlier value
LOWHISKRlow whisker value, if different from MIN
HIWHISKRhigh whisker value, if different from MAX
FARLOWlow far outlier value
FARHIGHhigh far outlier value

The features identified by _TYPE_ values N, LDLX, UDLX, RESPMEAN, MIN, Q1, MEDIAN, MEAN, Q3, and MAX are required for each group.

Other variables that can be read from a BOX= data set include:

  • the variable _ID_, containing labels for outliers
  • the variable _HTML_, containing URLs to be associated with features on box plots
  • block-variables
  • symbol-variable
  • BY variables
  • ID variables

When you specify one of the keywords SCHEMATICID or SCHEMATICIDFAR with the BOXSTYLE= option, values of _ID_ are used as outlier labels. If _ID_ does not exist in the BOX= data set, the values of the first variable listed in the ID statement are used.

DATA= Data Set

You can read raw data (response values) from a DATA= data set specified in the PROC ANOM statement. Each response specified in the BOXCHART statement must be a SAS variable in the DATA= data set. This variable provides measurements that must be grouped into group samples indexed by the group-variable. The group-variable, which is specified in the BOXCHART statement, must also be a SAS variable in the DATA= data set. Each observation in a DATA= data set must contain a value for each response and a value for the group-variable. If the ith group contains n_{i} items, there should be n_{i} consecutive observations for which the value of the group-variable is the index of the ith group. For example, if each group contains five items and there are 10 groups, the DATA= data set should contain 50 observations.

Other variables that can be read from a DATA= data set include

  • _PHASE_ (if the READPHASES= option is specified)
  • block-variables
  • symbol-variable
  • BY variables
  • ID variables

By default, the ANOM procedure reads all of the observations in a DATA= data set. However, if the data set includes the variable _PHASE_, you can read selected groups of observations (referred to as phases) with the READPHASES= option.

For an example of a DATA= data set, see "Creating ANOM Boxcharts from Response Values".

LIMITS= Data Set

You can read preestablished decision limits (or parameters from which the decision limits can be calculated) from a LIMITS= data set specified in the PROC ANOM statement. For example, the following statements read decision limit information from the data set Conlims:

  
    proc anom data=Info limits=Conlims; 
       xchart Weight*Batch; 
    run;
 

The LIMITS= data set can be an OUTLIMITS= data set that was created in a previous run of the ANOM procedure. Such data sets always contain the variables required for a LIMITS= data set; see Table 5.20. The LIMITS= data set can also be created directly using a DATA step. When you create a LIMITS= data set, you must provide one of the following:

  • the variables _LDLX_, _MEAN_, and _UDLX_, which specify the decision limits directly
  • the variables _MEAN_, _MSE_, and _DFE_, which are used to calculate the decision limits according to the equations in the section "Decision Limits".

In addition, note the following:

  • The variables _VAR_ and _GROUP_ are required. These must be character variables whose lengths are no greater than 32.
  • _DFE_ is optional. The default is \nu = n-k, and in the case of equal group sizes, \nu = k(n-1).
  • _MSE_ is optional if _LDLX_ and _UDLX_ are specified; otherwise it is required.
  • _LDLX_ and _UDLX_ must be specified together; otherwise their values are computed.
  • _ALPHA_ is optional but is recommended in order to maintain a complete set of decision limit information. The default value is 0.05.
  • _LIMITK_ is optional. The default value is k, the number of groups. A group must have at least one nonmissing value (n_{i} \geq 1) and there must be at least one group with n_{i} \geq 2. If specified, _LIMITK_ overrides the value of k.
  • _LIMITN_ is optional. The default value is the common group size (n), in the balanced case n_i \equiv n. If specified, _LIMITN_ overrides the value of n.
  • The variable _TYPE_ is optional, but is recommended to maintain a complete set of decision limit information. The variable _TYPE_ must be a character variable of length 8. Valid values are 'ESTIMATE,' 'STANDARD,' 'STDMEAN,' and 'STDRMS.' The default is 'ESTIMATE.'
  • The variable _INDEX_ is required if you specify the READINDEX= option; this must be a character variable whose length is no greater than 48.
  • BY variables are required if specified with a BY statement.

SUMMARY= Data Set

You can read group summary statistics from a SUMMARY= data set specified in the PROC ANOM statement. This enables you to reuse OUTSUMMARY= data sets that have been created in previous runs of the ANOM procedure or to read output data sets created with SAS summarization procedures, such as PROC MEANS.

A SUMMARY= data set used with the BOXCHART statement must contain the following:

  • the group-variable
  • a group minimum variable for each response
  • a group first-quartile variable for each response
  • a group mean variable for each response
  • a group median variable for each response
  • a group third-quartile variable for each response
  • a group maximum variable for each response
  • a group standard deviation variable for each response
  • a group sample size variable for each response

The names of the group summary statistics variables must be the response name concatenated with the following special suffix characters:

Group Summary Statistic Suffix Character
group minimumL
group first-quartile1
group medianM
group meanX
group third-quartile3
group maximumH
group standard deviationS
group sample sizeN

For example, consider the following statements:

  
    proc anom summary=Summary; 
       xchart (Weight Yieldstrength)*Batch; 
    run;
 

The data set Summary must include the variables Batch, WeightL, Weight1, WeightX, WeightM, Weight3, WeightH, WeightS, WeightN, YieldstrengthL, Yieldstrength1, YieldstrengthX, YieldstrengthM, Yieldstrength3, YieldstrengthH, YieldstrengthS, and YieldstrengthN. Note that if you specify a response name that contains 32 characters, the names of the summary variables must be formed from the first 16 characters and the last 15 characters of the response name, suffixed with the appropriate character.

Other variables that can be read from a SUMMARY= data set include

  • _PHASE_ (if the READPHASES= option is specified)
  • block-variables
  • symbol-variable
  • BY variables
  • ID variables

By default, the ANOM procedure reads all of the observations in a SUMMARY= data set. However, if the data set includes the variable _PHASE_, you can read selected groups of observations (referred to as phases) by specifying the READPHASES= option.

For an example of a SUMMARY= data set, see "Creating ANOM Boxcharts from Group Summary Data".

TABLE= Data Set

You can read summary statistics and decision limits from a TABLE= data set specified in the PROC ANOM statement. This enables you to reuse an OUTTABLE= data set created in a previous run of the ANOM procedure. Because the ANOM procedure simply displays the information in a TABLE= data set, you can use TABLE= data sets to create specialized ANOM charts.

The following table lists the variables required in a TABLE= data set used with the BOXCHART statement:

Table 5.24: Variables Required in a TABLE= Data Set
Variable Description
group-variablevalues of the group-variable
_LDLX_lower decision limit for mean
_LIMITN_nominal sample size associated with the decision limits
_MEAN_central line
_SUBMAX_group maximum
_SUBMED_group median
_SUBMIN_group minimum
_SUBN_group sample size
_SUBQ1_group first quartile
_SUBQ3_group third quartile
_SUBX_group mean
_UDLX_upper decision limit for mean

Other variables that can be read from a TABLE= data set include

  • block-variables
  • symbol-variable
  • BY variables
  • ID variables
  • _PHASE_ (if the READPHASES= option is specified). This variable must be a character variable whose length is no greater than 48.
  • _VAR_. This variable is required if more than one response is specified or if the data set contains information for more than one response. This variable must be a character variable whose length is no greater than 32.

For an example of a TABLE= data set, see "Saving Decision Limits".