IMSTAT Procedure (Analytics)

KDE Statement

The KDE statement calculates kernel-density estimates of the distribution of one or more numeric variables from an in-memory table. You can choose between normal, tricube, and quadratic kernel functions. The default is to use a normal kernel function. The number of points returned are determined by the center region of a multi-threaded, inverse finite Fourier transform.

Syntax

KDE variable-list </ options>;

Required Argument

variable-list

specifies a one or more numeric variables.

KDE Statement Options

BANDWIDTH=b

specifies the standardized bandwidth of the kernel function. The default bandwidths are optimal values that minimize the asymptotic mean integrated squared errors of the kernel function. The actual bandwidth for the kernel estimator is a multiple of the standardized bandwidth, the inter-quartile range of the data, and n-1/5. Larger values for bandwidth result in smoother density estimates. However, specifying a bandwidth that is too large can result in density estimates that omit important aspects of the distribution at finer granularity.

KERNEL= NORMAL | TRICUBE | QUADRATIC

specifies the kernel function.

Default NORMAL

MAX=number

specifies the largest value to consider in the density calculation. If a value is not specified, then the largest value in the data range is used, subject to the WHERE clause.

Alias UPPER=

MIN=number

specifies the smallest value to consider in the density calculation. If a value is not specified, the smallest value in the data range is used, subject to the WHERE clause.

Alias LOWER=

MULTIPLER=number

specifies a scaling factor for the calculated density.

Default 1

NPOINTS=n

specifies the number of points from which to calculate the center region of the inverse finite Fourier transform. The value of the NPOINTS= option is adjusted to the largest integer of power of 2 that is equal to or smaller than n. For example, specifying NPOINTS=40 is adjusted to 32. The number of density points returned depends on the distribution of the data.

Default 512
Range 16 to 512

SAVE=table-name

saves the result table so that you can use it in other IMSTAT procedure statements like STORE, REPLAY, and FREE. The value for table-name must be unique within the scope of the procedure execution. The name of a table that has been freed with the FREE statement can be used again in subsequent SAVE= options.

SCALE= PERCENT | COUNT | PROPORTION

specifies the units in which the density is calculated.

TEMPEXPRESS="SAS-expressions"

TEMPEXPRESS=file-reference

specifies either a quoted string that contains the SAS expression that defines the temporary variables or a file reference to an external file with the SAS statements.

Alias TE=

TEMPNAMES=variable-name

TEMPNAMES=(variable-list)

specifies the list of temporary variables for the request. Each temporary variable must be defined through SAS statements that you supply with the TEMPEXPRESS= option.

Alias TN=

Details

ODS Table Names

The KDE statement generates the following ODS table for each analysis variable.
ODS Table Name
Description
Option
KDE
Kernel density estimation results
Default
For information about using the ODS table with SAVE= option, see the Details section of the STORE statement.