PROC KDE: UNIVAR Statement :: SAS/STAT(R) 9.2 User's Guide, Second Edition

The KDE Procedure

UNIVAR Statement

UNIVAR variable <(v-options)> <...variable <(v-options)>> < / options > ;

The UNIVAR statement computes univariate kernel density estimates. You can specify various v-options for each variable by enclosing them in parentheses after the variable name. You can also specify global options among the UNIVAR statement options following a slash (/). Global options apply to all the variables specified in the UNIVAR statement. However, individual variable v-options override the global options.

Note:The VAR statement supported by PROC KDE in SAS 8 and earlier releases is now obsolete. The VAR statement has been replaced by the UNIVAR and BIVAR statements, which enable you to produce multiple kernel density estimates with a single invocation of the procedure.

You can specify the following options in the UNIVAR statement. As noted, some options can be used as v-options.

BWM=number

specifies a bandwidth multiplier used for each kernel density estimate. The default value is 1. Larger multipliers produce a smoother estimate, and smaller ones produce a rougher estimate. To specify different bandwidth multipliers for different variables, specify BWM= as a v-option.

GRIDL=number

specifies a lower grid limit used for each kernel density estimate. The default value for a given variable is the minimum observed value of that variable. To specify different lower grid limits for different variables, specify GRIDL= as a v-option.

GRIDU=number

specifies an upper grid limit used for each kernel density estimate. The default value for a given variable is the maximum observed value of that variable. To specify different upper grid limits for different variables, specify GRIDU= as a v-option.

METHOD=SJPI | SNR | SNRQ | SROT | OS

specifies the method used to compute the bandwidth. Available methods are Sheather-Jones plug-in (SJPI), simple normal reference (SNR), simple normal reference that uses the interquartile range (SNRQ), Silverman’s rule of thumb (SROT), and oversmoothed (OS). See the section Bandwidth Selection and refer to Jones, Marron, and Sheather (1996) for a description of these methods. SJPI is the default method.

NGRID=number

NG=number

specifies a number of grid points used for each kernel density estimate. The default value is 401. To specify different numbers of grid points for different variables, specify NGRID= as a v-option.

NOPRINT

suppresses output tables produced by the UNIVAR statement. You can use the NOPRINT option when you want to produce graphical output only.

OUT=SAS-data-set

specifies the output SAS data set containing the kernel density estimate. This output data set contains the following variables:

var, whose value is the name of the variable in the kernel density estimate
value, with values corresponding to grid coordinates for the variable
density, with values equal to kernel density estimates at the associated grid point
count, containing the number of original observations contained in the bin corresponding to a grid point

PERCENTILES

PERCENTILES=numlist

requests that a table of percentiles be computed for each UNIVAR variable. You can specify a list of percentiles to be computed. The default percentiles are 0.5, 1, 2.5, 5, 10, 25, 50, 75, 90, 95, 97.5, 99, and 99.5.

PLOTS=plot-request | ALL | NONE

PLOTS=(plot-request < $\text{[math]}$ plot-request>)

requests plots of the univariate kernel density estimate. The following table shows the available plot-requests.

Keyword	Description
ALL	produces all plots
DENSITY	univariate kernel density estimate curve
DENSITYOVERLAY	overlaid univariate kernel density estimate curves
HISTDENSITY	univariate histogram of data overlaid with kernel density estimate curve
HISTOGRAM	univariate histogram of data
NONE	suppresses all plots

By default, if you enable ODS Graphics and you do not specify the PLOTS= option, then the UNIVAR statement creates a histogram overlaid with a kernel density estimate. If you specify the PLOTS= option, you get only the requested plots.

If you specify more than one variable in the UNIVAR statement, the DENSITYOVERLAY keyword overlays the density curves for all the variables on a single plot. The other keywords each produce a separate plot for every variable listed in the UNIVAR statement.

SJPIMAX=number

specifies the maximum grid value in determining the Sheather-Jones plug-in bandwidth. The default value is two times the oversmoothed estimate.

SJPIMIN=number

specifies the minimum grid value in determining the Sheather-Jones plug-in bandwidth. The default value is the maximum value divided by 18.

SJPINUM=number

specifies the number of grid values used in determining the Sheather-Jones plug-in bandwidth. The default is 21.

SJPITOL=number

specifies the tolerance for termination of the bisection algorithm used in computing the Sheather-Jones plug-in bandwidth. The default value is 0.001.

UNISTATS

produces a table for each variable containing standard univariate statistics and the bandwidth used to compute its kernel density estimate. The statistics listed are the mean, variance, standard deviation, range, and interquartile range.

Examples

Suppose you have the variables x1, x2, x3, and x4 in the SAS data set MyData. You can request a univariate kernel density estimate for each of these variables with the following statements:

   proc kde data=MyData; 
      univar x1 x2 x3 x4;
   run;

You can also specify different bandwidths and other options for each variable. For example, the following statements request kernel density estimates that use Silverman’s rule of thumb (SROT) method for all variables:

   proc kde data=MyData; 
      univar x1 (bwm=2) 
             x2 (bwm=0.5 ngrid=100)
             x3 x4 / ngrid=200 method=srot;
   run;

The option NGRID=200 applies to the variables x1, x3, and x4, but the v-option NGRID=100 is applied to x2. Bandwidth multipliers of 2 and 0.5 are specified for the variables x1 and x2, respectively.

Top of Page