The UNIVARIATE Procedure

INSET Statement

  • INSET keywords </ options>;

An INSET statement places a box or table of summary statistics, called an inset, directly in a graph created with a CDFPLOT, HISTOGRAM, PPPLOT, PROBPLOT, or QQPLOT statement. The INSET statement must follow the plot statement that creates the plot that you want to augment. The inset appears in all the graphs that the preceding plot statement produces.

You can use multiple INSET statements after a plot statement to add more than one inset to a plot. See Example 4.17.

In an INSET statement, you specify one or more keywords that identify the information to display in the inset. The information is displayed in the order that you request the keywords. Keywords can be any of the following:

  • statistical keywords

  • primary keywords

  • secondary keywords

Statistical Keywords

The available statistical keywords are listed in Table 4.10.

Table 4.10: Statistical Keywords

Keyword

Description

Descriptive Statistic Keywords

CSS

Corrected sum of squares

CV

Coefficient of variation

GEOMEAN

Geometric mean

KURTOSIS | KURT

Kurtosis

MAX

Largest value

MEAN

Sample mean

MIN

Smallest value

MODE

Most frequent value

N

Sample size

NEXCL

Number of observations excluded by MAXNBIN= or MAXSIGMAS= option

NMISS

Number of missing values

NOBS

Number of observations

RANGE

Range

SKEWNESS | SKEW

Skewness

STD | STDDEV

Standard deviation

STDMEAN | STDERR

Standard error of the mean

SUM

Sum of the observations

SUMWGT

Sum of the weights

USS

Uncorrected sum of squares

VAR

Variance

Percentile Statistic Keywords

P1

1st percentile

P5

5th percentile

P10

10th percentile

Q1

 

P25

Lower quartile (25th percentile)

MEDIAN

 

Q2

 

P50

Median (50th percentile)

Q3

 

P75

Upper quartile (75th percentile)

P90

90th percentile

P95

95th percentile

P99

99th percentile

QRANGE

Interquartile range (Q3–Q1)

Keywords for Distribution-Free Confidence Limits for Percentiles (CIPCTLDF Option)

P1_LCL_DF

1st percentile lower confidence limit

P1_UCL_DF

1st percentile upper confidence limit

P5_LCL_DF

5th percentile lower confidence limit

P5_UCL_DF

5th percentile upper confidence limit

P10_LCL_DF

10th percentile lower confidence limit

P10_UCL_DF

10th percentile upper confidence limit

Q1_LCL_DF

 

P25_LCL_DF

Lower quartile (25th percentile) lower confidence limit

Q1_UCL_DF

 

P25_UCL_DF

Lower quartile (25th percentile) upper confidence limit

MEDIAN_LCL_DF

 

Q2_LCL_DF

 

P50_LCL_DF

Median (50th percentile) lower confidence limit

MEDIAN_UCL_DF

 

Q2_UCL_DF

 

P50_UCL_DF

Median (50th percentile) upper confidence limit

Q3_LCL_DF

 

P75_LCL_DF

Upper quartile (75th percentile) lower confidence limit

Q3_UCL_DF

 

P75_UCL_DF

Upper quartile (75th percentile) upper confidence limit

P90_LCL_DF

90th percentile lower confidence limit

P90_UCL_DF

90th percentile upper confidence limit

P95_LCL_DF

95th percentile lower confidence limit

P95_UCL_DF

95th percentile upper confidence limit

P99_LCL_DF

99th percentile lower confidence limit

P99_UCL_DF

99th percentile upper confidence limit

Keywords Percentile Confidence Limits Assuming Normality (CIPCTLNORMAL Option)

P1_LCL

1st percentile lower confidence limit

P1_UCL

1st percentile upper confidence limit

P5_LCL

5th percentile lower confidence limit

P5_UCL

5th percentile upper confidence limit

P10_LCL

10th percentile lower confidence limit

P10_UCL

10th percentile upper confidence limit

Q1_LCL

 

P25_LCL

Lower quartile (25th percentile) lower confidence limit

Q1_UCL

 

P25_UCL

Lower quartile (25th percentile) upper confidence limit

MEDIAN_LCL

 

Q2_LCL

 

P50_LCL

Median (50th percentile) lower confidence limit

MEDIAN_UCL

 

Q2_UCL

 

P50_UCL

Median (50th percentile) upper confidence limit

Q3_LCL

 

P75_LCL

Upper quartile (75th percentile) lower confidence limit

Q3_UCL

 

P75_UCL

Upper quartile (75th percentile) upper confidence limit

P90_LCL

90th percentile lower confidence limit

P90_UCL

90th percentile upper confidence limit

P95_LCL

95th percentile lower confidence limit

P95_UCL

95th percentile upper confidence limit

P99_LCL

99th percentile lower confidence limit

P99_UCL

99th percentile upper confidence limit

Robust Statistics Keywords

GINI

Gini’s mean difference

MAD

Median absolute difference about the median

QN

, alternative to MAD

SN

, alternative to MAD

STD_GINI

Gini’s standard deviation

STD_MAD

MAD standard deviation

STD_QN

standard deviation

STD_QRANGE

Interquartile range standard deviation

STD_SN

standard deviation

Hypothesis Testing Keywords

MSIGN

Sign statistic

NORMALTEST

Test statistic for normality

PNORMAL

Probability value for the test of normality

SIGNRANK

Signed rank statistic

PROBM

Probability of greater absolute value for the sign statistic

PROBN

Probability value for the test of normality

PROBS

Probability value for the signed rank test

PROBT

Probability value for the Student’s t test

T

Statistics for Student’s t test

Keyword for Reading an Input Data Set

DATA=

(label, value) pairs from input data set


To create a completely customized inset, use a DATA= data set.

DATA=SAS-data-set

requests that PROC UNIVARIATE display customized statistics from a SAS data set in the inset table. The data set must contain two variables:

_LABEL_

is a character variable whose values provide labels for inset entries.

_VALUE_

is a variable that is either character or numeric and whose values provide values for inset entries.

The label and value from each observation in the data set occupy one line in the inset. The position of the DATA= keyword in the keyword list determines the position of its lines in the inset.

Primary and Secondary Keywords

A primary keyword specifies a fitted distribution, which is one of the parametric distributions or a kernel density estimate. You specify secondary keywords in parentheses after the primary keyword to request particular statistics associated with that distribution.

Note: When producing traditional graphics output, you can specify a primary keyword without secondary keywords to display a colored line and the distribution name as a key for the density curve.

In the HISTOGRAM statement you can request more than one fitted distribution from the same family (for example, two normal distributions). You can display inset statistics for individual curves by specifying the curve indices in square brackets immediately following the primary keyword.

The following statements produce a histogram with three fitted normal curves and an inset that contains goodness-of-fit statistics for the second curve only:

proc univariate data=score;
   histogram final / normal(sigma=1 2 3);
   inset normal[2](ad adpval);
run;

Table 4.11 lists the primary keywords and the plot statements with which they can be specified.

Table 4.11: Primary Keywords

Keyword

Distribution

Plot Statement Availability

BETA

Beta

All plot statements

EXPONENTIAL

Exponential

All plot statements

GAMMA

Gamma

All plot statements

GUMBEL

Gumbel

All plot statements

IGAUSS

Inverse Gaussian

CDFPLOT, HISTOGRAM, PPPLOT

KERNEL

Kernel density estimate

HISTOGRAM

LOGNORMAL

Lognormal

All plot statements

NORMAL

Normal

All plot statements

PARETO

Pareto

All plot statements

POWER

Power function

All plot statements

RAYLEIGH

Rayleigh

All plot statements

SB

Johnson

HISTOGRAM

SU

Johnson

HISTOGRAM

WEIBULL

Weibull(3-parameter)

All plot statements

WEIBULL2

Weibull(2-parameter)

PROBPLOT, QQPLOT


Table 4.12 lists the secondary keywords available with the primary keywords listed in Table 4.11.

Table 4.12: Secondary Keywords

Secondary Keyword

Alias

Description

BETA Secondary Keywords

ALPHA

SHAPE1

First shape parameter

BETA

SHAPE2

Second shape parameter

MEAN

 

Mean of the fitted distribution

SIGMA

SCALE

Scale parameter

STD

 

Standard deviation of the fitted distribution

THETA

THRESHOLD

Lower threshold parameter

EXPONENTIAL Secondary Keywords

MEAN

 

Mean of the fitted distribution

SIGMA

SCALE

Scale parameter

STD

 

Standard deviation of the fitted distribution

THETA

THRESHOLD

Threshold parameter

GAMMA Secondary Keywords

ALPHA

SHAPE

Shape parameter

MEAN

 

Mean of the fitted distribution

SIGMA

SCALE

Scale parameter

STD

 

Standard deviation of the fitted distribution

THETA

THRESHOLD

Threshold parameter

GUMBEL Secondary Keywords

MEAN

 

Mean of the fitted distribution

MU

 

Location parameter

SIGMA

SCALE

Scale parameter

STD

 

Standard deviation of the fitted distribution

IGAUSS Secondary Keywords

LAMBDA

 

Shape parameter

MEAN

 

Mean of the fitted distribution

MU

 

Mean parameter

STD

 

Standard deviation of the fitted distribution

KERNEL Secondary Keywords

AMISE

 

Approximate mean integrated square error (MISE) for the kernel density

BANDWIDTH

 

Bandwidth for the density estimate

BWIDTH

 

Alias for BANDWIDTH

C

 

Standardized bandwidth for the density estimate

TYPE

 

Kernel type: normal, quadratic, or triangular

LOGNORMAL Secondary Keywords

MEAN

 

Mean of the fitted distribution

SIGMA

SHAPE

Shape parameter

STD

 

Standard deviation of the fitted distribution

THETA

THRESHOLD

Threshold parameter

ZETA

SCALE

Scale parameter

NORMAL Secondary Keywords

MU

MEAN

Mean parameter

SIGMA

STD

Scale parameter

PARETO Secondary Keywords

ALPHA

 

Shape parameter

MEAN

 

Mean of the fitted distribution

SIGMA

SCALE

Scale parameter

STD

 

Standard deviation of the fitted distribution

THETA

THRESHOLD

Threshold parameter

POWER Secondary Keywords

ALPHA

 

Shape parameter

MEAN

 

Mean of the fitted distribution

SIGMA

SCALE

Scale parameter

STD

 

Standard deviation of the fitted distribution

THETA

THRESHOLD

Threshold parameter

RAYLEIGH Secondary Keywords

MEAN

 

Mean of the fitted distribution

SIGMA

SCALE

Scale parameter

STD

 

Standard deviation of the fitted distribution

THETA

THRESHOLD

Threshold parameter

SB and SU Secondary Keywords

DELTA

SHAPE1

First shape parameter

GAMMA

SHAPE2

Second shape parameter

MEAN

 

Mean of the fitted distribution

SIGMA

SCALE

Scale parameter

STD

 

Standard deviation of the fitted distribution

THETA

THRESHOLD

Lower threshold parameter

WEIBULL Secondary Keywords

C

SHAPE

Shape parameter c

MEAN

 

Mean of the fitted distribution

SIGMA

SCALE

Scale parameter

STD

 

Standard deviation of the fitted distribution

THETA

THRESHOLD

Threshold parameter

WEIBULL2 Secondary Keywords

C

SHAPE

Shape parameter c

MEAN

 

Mean of the fitted distribution

SIGMA

SCALE

Scale parameter

STD

 

Standard deviation of the fitted distribution

THETA

THRESHOLD

Known lower threshold

Keywords Available for All Parametric (non-KERNEL) Distributions

AD

 

Anderson-Darling EDF test statistic

ADPVAL

 

Anderson-Darling EDF test p-value

CVM

 

Cramér–von Mises EDF test statistic

CVMPVAL

 

Cramér–von Mises EDF test p-value

KSD

 

Kolmogorov-Smirnov EDF test statistic

KSDPVAL

 

Kolmogorov-Smirnov EDF test p-value


The inset statistics listed in Table 4.12 are not available unless you request a plot statement and options that calculate these statistics. For example, consider the following statements:

proc univariate data=score;
   histogram final / normal;
   inset mean std normal(ad adpval);
run;

The MEAN and STD keywords display the sample mean and standard deviation, respectively, of final. The NORMAL keyword with the secondary keywords AD and ADPVAL displays the Anderson-Darling goodness-of-fit test statistic and p-value, respectively. The statistics that are specified with the NORMAL keyword are available only because the NORMAL option is requested in the HISTOGRAM statement.

The KERNEL keyword is available only if you request a kernel density estimate in a HISTOGRAM statement. The WEIBULL2 keyword is available only if you request a two-parameter Weibull distribution in the PROBPLOT or QQPLOT statement.

INSET Statistic Labels and Formats

By default, PROC UNIVARIATE identifies inset statistics with appropriate labels and prints numeric values with appropriate formats. To customize the label, specify the keyword followed by an equal sign (=) and the desired label in quotes. To customize the format, specify a numeric format in parentheses after the keyword. Labels can have up to 24 characters. If you specify both a label and a format for a statistic, the label must appear before the format. For example, the following statement requests customized labels for two statistics and displays the standard deviation with a field width of 5 and two decimal places:

inset n='Sample Size' std='Std Dev' (5.2);

Summary of Options

Table 4.13 lists INSET statement options, which are specified after the slash (/) in the INSET statement. For complete descriptions, see the section Dictionary of Options.

Table 4.13: INSET Options

Option

Description

CFILL=color | BLANK

specifies color of inset background

CFILLH=color

specifies color of header background

CFRAME=color

specifies color of frame

CHEADER=color

specifies color of header text

CSHADOW=color

specifies color of drop shadow

CTEXT=color

specifies color of inset text

DATA

specifies data units for POSITION= coordinates

FONT=font

specifies font of text

FORMAT=format

specifies format of values in inset

GUTTER=value

specifies gutter width for inset in top or bottom margin

HEADER='string'

specifies header text

HEIGHT=value

specifies height of inset text

NCOLS=

specifies number of columns for inset in top or bottom margin

NOFRAME

suppresses frame around inset

POSITION=position

specifies position of inset

REFPOINT=BR | BL | TR | TL

specifies reference point of inset positioned with POSITION= coordinates


Dictionary of Options

The following entries provide detailed descriptions of options for the INSET statement. Options marked with † are applicable only when traditional graphics are produced.

† CFILL=color | BLANK

specifies the color of the background for traditional graphics. If you omit the CFILLH= option the header background is included. By default, the background is empty, which causes items that overlap the inset (such as curves or histogram bars) to show through the inset.

If you specify a value for CFILL= option, then overlapping items no longer show through the inset. Use CFILL=BLANK to leave the background uncolored and to prevent items from showing through the inset.

† CFILLH=color

specifies the color of the header background for traditional graphics. The default value is the CFILL= color.

† CFRAME=color

specifies the color of the frame for traditional graphics. The default value is the same color as the axis of the plot.

† CHEADER=color

specifies the color of the header text for traditional graphics. The default value is the CTEXT= color.

† CSHADOW=color

specifies the color of the drop shadow for traditional graphics. By default, if a CSHADOW= option is not specified, a drop shadow is not displayed.

† CTEXT=color

specifies the color of the text for traditional graphics. The default value is the same color as the other text on the plot.

DATA

specifies that data coordinates are to be used in positioning the inset with the POSITION= option. The DATA option is available only when you specify POSITION=(x,y). You must place DATA immediately after the coordinates (x,y). Note: Positioning insets with coordinates is not supported for ODS Graphics output.

† FONT=font

specifies the font of the text for traditional graphics. By default, if you locate the inset in the interior of the plot, then the font is SIMPLEX. If you locate the inset in the exterior of the plot, then the font is the same as the other text on the plot.

FORMAT=format

specifies a format for all the values in the inset. If you specify a format for a particular statistic, then that format overrides the one specified with the FORMAT= option. For more information about SAS formats, see SAS Formats and Informats: Reference.

GUTTER=value

specifies the gutter width in percent screen units for an inset located in the top or bottom margin. The gutter is the space between columns of (label, value) pairs in an inset. The default value is four. Note: The GUTTER= option applies only when ODS Graphics is enabled.

HEADER=string

specifies the header text. The string cannot exceed 40 characters. By default, no header line appears in the inset. If all the keywords that you list in the INSET statement are secondary keywords that correspond to a fitted curve on a histogram, PROC UNIVARIATE displays a default header that indicates the distribution and identifies the curve.

† HEIGHT=value

specifies the height of the text for traditional graphics.

NCOLS=n

specifies the number of columns of (label, value) pairs displayed in an inset located in the top or bottom margin. The default value is three. Note: The NCOLS= option applies only when ODS Graphics is enabled.

NOFRAME

suppresses the frame drawn around the text.

POSITION=position
POS=position

determines the position of the inset. The position is a compass point keyword, a margin keyword, or a pair of coordinates (x,y). You can specify coordinates in axis percent units or axis data units. The default value is NW, which positions the inset in the upper left (northwest) corner of the display. See the section Positioning Insets.

Note: Positioning insets with coordinates is not supported for ODS Graphics output.

† REFPOINT=BR | BL | TR | TL

specifies the reference point for an inset that PROC UNIVARIATE positions by a pair of coordinates with the POSITION= option. The REFPOINT= option specifies which corner of the inset frame that you want to position at coordinates (x,y). The keywords are BL, BR, TL, and TR, which correspond to bottom left, bottom right, top left, and top right. The default value is BL. You must use REFPOINT= with POSITION=(x,y) coordinates. The option does not apply to ODS Graphics output.