The KDE Procedure |
BIVAR Statement |
The BIVAR statement computes bivariate kernel density estimates. The basic syntax for the BIVAR statement specifies two variables:
This statement requests a bivariate kernel density estimate for the variables v1 and v2. The v-options optionally specified in parentheses after a variable name apply only to that variable, and override corresponding global options specified following a slash (/).
You can specify a list of more than two variables:
This statement requests a bivariate kernel density estimate for each distinct pair of variables in the list. For example, if you specify
bivar x y z;
then a bivariate kernel density estimate is computed for each of the variable pairs (x, y), (x, z), and (y, z).
Alternatively, you can specify an explicit list of variable pairs, with each pair enclosed in parentheses:
(You can also specify v-options following a variable name appearing in an explicit pair, but they are omitted here for clarity.) This statement requests a bivariate kernel density estimate for each pair of variables. For example, if you specify
bivar (x y) (y z);
then bivariate kernel density estimates are computed for (x, y) and (y, z).
Note:The VAR statement supported by PROC KDE in SAS 8 and earlier releases is now obsolete. The VAR statement has been replaced by the UNIVAR and the BIVAR statements, which enable you to produce multiple kernel density estimates with a single invocation of the procedure.
You can specify the following options in the BIVAR statement. As noted, some options can be used as v-options.
produces a table for each density estimate containing the covariance and correlation between the two variables.
specifies the bandwidth multiplier applied to each variable in each kernel density estimate. The default value is 1. Larger multipliers produce a smoother estimate, and smaller ones produce a rougher estimate. To specify different bandwidth multipliers for different variables, specify BWM= as a v-option.
specifies the lower grid limit applied to each variable in each kernel density estimate. The default value for a given variable is the minimum observed value of that variable. To specify different lower grid limits for different variables, specify GRIDL= as a v-option.
specifies the upper grid limit applied to each variable in each kernel density estimate. The default value for a given variable is the maximum observed value of that variable. To specify different upper grid limits for different variables, specify GRIDU= as a v-option.
requests a table of levels for contours of the bivariate density. The contours are defined in such a way that the density has a constant level along each contour, and the volume enclosed by each contour corresponds to a specified percent. In other words, the contours correspond to slices or levels of the density surface taken along the density axis. You can specify the percents used to define the contours. The default values are 1, 5, 10, 50, 90, 95, 99, and 100. The "Levels" table also provides the minimum and maximum values for each contour along the directions of the two data variables.
specifies the number of grid points associated with each variable in each kernel density estimate. The default value is 60. To specify different numbers of grid points for different variables, specify NGRID= as a v-option.
suppresses output tables produced by the BIVAR statement. You can use the NOPRINT option when you want to produce graphical output only.
specifies the name of the output data set in which kernel density estimates are saved. This output data set contains the following variables:
var1, whose value is the name of the first variable in a bivariate kernel density estimate
var2, whose value is the name of the second variable in a bivariate kernel density estimate
value1, with values corresponding to grid coordinates for the first variable
value2, with values corresponding to grid coordinates for the second variable
density, with values equal to kernel density estimates at the associated grid point
count, containing the number of original observations contained in the bin corresponding to a grid point
requests that a table of percentiles be computed for each BIVAR variable. You can specify a list of percentiles to be computed. The default percentiles are 0.5, 1, 2.5, 5, 10, 25, 50, 75, 90, 95, 97.5, 99, and 99.5.
requests one or more plots of the bivariate data and kernel density estimate. By default, if you enable ODS Graphics and you do not specify the PLOTS= option, then the BIVAR statement creates a contour plot. If you specify the PLOTS= option, you get only the requested plots.
The following plot-requests are available.
produces all bivariate plots.
produces a contour plot of the bivariate density estimate.
produces a contour plot of the bivariate density estimate overlaid with a scatter plot of the data.
produces a bivariate histogram of the data. The following view-options can be specified:
rotates the histogram angle degrees, where –180 < angle < 180. By default, angle = 54.
tilts the histogram angle degrees, where –180 < angle < 180. By default, angle = 20.
produces a bivariate histogram of the data overlaid with a surface plot of the bivariate kernel density estimate. The following view-options can be specified:
rotates the histogram and kernel density surface angle degrees, where –180 < angle < 180. By default, angle = 54.
tilts the histogram and kernel density surface angle degrees, where –180 < angle < 180. By default, angle = 20.
suppresses all plots, including the contour plot that is produced by default when ODS Graphics is enabled and the PLOTS= option is not specified.
produces a scatter plot of the data.
produces a surface plot of the bivariate kernel density estimate. The following view-options can be specified:
rotates the kernel density surface angle degrees, where –180 < angle < 180. By default, angle = 54.
tilts the kernel density surface angle degrees, where –180 < angle < 180. By default, angle = 20.
produces a table for each density estimate containing standard univariate statistics for each of the two variables and the bandwidths used to compute the kernel density estimate. The statistics listed are the mean, variance, standard deviation, range, and interquartile range.
Copyright © 2009 by SAS Institute Inc., Cary, NC, USA. All rights reserved.