The SURVEYREG Procedure

PROC SURVEYREG Statement

  • PROC SURVEYREG <options>;

The PROC SURVEYREG statement invokes the SURVEYREG procedure. It optionally names the input data sets and specifies the variance estimation method.

Table 101.2 summarizes the options available in the PROC SURVEYREG statement.

Table 101.2: PROC SURVEYREG Statement Options

Option

Description

ALPHA=

Sets the confidence level

DATA=

Specifies the SAS data set to be analyzed

MISSING

Treats missing values as a nonmissing

NAMELEN=

Specifies the length of effect names

NOMCAR

Treats missing values as not missing completely at random

ORDER=

Specifies the sort order

PLOTS=

Requests plots from ODS Graphics

RATE=

Specifies the sampling rate

TOTAL=

Specifies the total number of primary sampling units

TRUNCATE

Specifies class levels using no more than the first 16 characters of the formatted values

VARMETHOD=

Specifies the variance estimation method


You can specify the following options in the PROC SURVEYREG statement:

ALPHA=$\alpha $

sets the confidence level for confidence limits. The value of the ALPHA= option must be between 0 and 1, and the default value is 0.05. A confidence level of $\alpha $ produces $100(1 - \alpha )$% confidence limits. The default of ALPHA=0.05 produces 95% confidence limits.

DATA=SAS-data-set

specifies the SAS data set to be analyzed by PROC SURVEYREG. If you omit the DATA= option, the procedure uses the most recently created SAS data set.

MISSING

treats missing values as a valid (nonmissing) category for all categorical variables, which include CLASS , STRATA , CLUSTER , and DOMAIN variables.

By default, if you do not specify the MISSING option, an observation is excluded from the analysis if it has a missing value. For more information, see the section Missing Values.

NAMELEN=n

specifies the length of effect names in tables and output data sets to be n characters, where n is a value between 40 and 200. The default length is 40 characters.

NOMCAR

requests that the procedure treat missing values in the variance computation as not missing completely at random (NOMCAR) for Taylor series variance estimation. When you specify the NOMCAR option, PROC SURVEYREG computes variance estimates by analyzing the nonmissing values as a domain or subpopulation, where the entire population includes both nonmissing and missing domains. See the section Missing Values for more details.

By default, PROC SURVEYREG completely excludes an observation from analysis if that observation has a missing value, unless you specify the MISSING option. Note that the NOMCAR option has no effect on a classification variable when you specify the MISSING option, which treats missing values as a valid nonmissing level.

The NOMCAR option applies only to Taylor series variance estimation. The replication methods, which you request with the VARMETHOD=BRR and VARMETHOD=JACKKNIFE options, do not use the NOMCAR option.

ORDER=DATA | FORMATTED | FREQ | INTERNAL

specifies the sort order for the levels of the classification variables (which are specified in the CLASS statement).

This option also determines the sort order for the levels of DOMAIN variables.

This option applies to the levels for all classification variables, except when you use the (default) ORDER=FORMATTED option with numeric classification variables that have no explicit format. In that case, the levels of such variables are ordered by their internal value.

The ORDER= option can take the following values:

Value of ORDER=

Levels Sorted By

DATA

Order of appearance in the input data set

FORMATTED

External formatted value, except for numeric variables with no explicit format, which are sorted by their unformatted (internal) value

FREQ

Descending frequency count; levels with the most observations come first in the order

INTERNAL

Unformatted value

By default, ORDER=FORMATTED. For ORDER=FORMATTED and ORDER=INTERNAL, the sort order is machine-dependent.

For more information about sort order, see the chapter on the SORT procedure in the Base SAS Procedures Guide and the discussion of BY-group processing in SAS Language Reference: Concepts.

PLOTS < ( global-plot-options ) > < = plot-request < (plot-option) > >
PLOTS < ( global-plot-options ) > < = ( plot-request < (plot-option) > <…plot-request < (plot-option) >> )>

controls the plots that are produced through ODS Graphics.

When ODS Graphics is enabled and when the regression model depends on at most one continuous variable as a regressor, excluding the intercept, the PLOTS= option in the PROC SURVEYREG statement controls fit plots for the regression.

A plot-request identifies the plot, and a plot-option controls the appearance and content of the plot. You can specify plot-options in parentheses after a plot-request. A global-plot-option applies to all plots for which it is available unless it is altered by a specific plot-option. You can specify global-plot-options in parentheses after the PLOTS option.

When you specify only one plot-request, you can omit the parentheses around it. Here are a few examples of requesting plots:

plots=all
plots(weight=heatmap)=fit

When the regression model depends on at most one continuous variable as a regressor, excluding the intercept, PROC SURVEYREG provides a bubble plot or a heat map for model fitting. In a bubble plot, the bubble area is proportional to the weight of an observation. In a heat map, the heat color represents the sum of the weights at the corresponding location. The default plot depends on the number of observations in your data. That is, for a data set that contains 100 observations or less, a bubble plot is the default. For a data set that contains more than 100 observations, a heat map is the default.

ODS Graphics must be enabled before you can request a plot. For example:

ods graphics on;
proc surveyreg plots=fit;
   model height=weight;
run;
ods graphics off;

For more information about enabling and disabling ODS Graphics, see the section Enabling and Disabling ODS Graphics in Chapter 21: Statistical Graphics Using ODS.

When ODS Graphics is enabled, the ESTIMATE , LSMEANS , LSMESTIMATE , and SLICE statements can produce plots that are associated with their analyses. For information about these plots, see the corresponding sections of Chapter 19: Shared Concepts and Topics.

For general information about ODS Graphics, see Chapter 21: Statistical Graphics Using ODS.

Global Plot Option

A global-plot-option applies to all plots for which the option is available unless it is altered by a specific plot-option. You can specify the following global-plot-options:

ONLY

suppresses the default plots and requests only the plots that are specified as plot-requests.

NBINS=nbin1 <nbin2>

specifies the number of bins for the heat map of the observation weights in the fit plot. Thus, this option implies WEIGHT=HEATMAP by default. If you specify only one number, nbin1, then it is used for both the horizontal and vertical axes; if you specify two numbers, nbin1 and nbin2, then the first, nbin1, is used for the horizontal axis and the second, nbin2, is used for the vertical axis. If you do not specify this option, then by default the number of bins is determined by first using the algorithm that is discussed in the section ODS Graphics in Chapter 54: The KDE Procedure, and then multiplying the resulting numbers of bins by 3. If you request hexagonal bins by specifying SHAPE=HEXAGONAL, then the hexagonal bins have approximately the same area as the same number of rectangular bins would have.

WEIGHT=BUBBLE
WEIGHT=HEATMAP | HEAT

requests either a bubble plot or a heat map of the data as an overlay on the regression line and confidence limits band of the prediction in a fit plot . In a bubble plot, the bubble area is proportional to the weight of an observation. In a heat map, the heat color represents the sum of the weights at the corresponding location.

If you do not specify this option, the default plot depends on the number of observations in your data: For a data set that contains 100 observations or less, the default is a bubble plot. For a data set that contains more than 100 observations, the default is a heat map. If you specify the NBINS= option, then WEIGHT=HEATMAP by default.

Plot Requests

You can specify the following plot-requests:

ALL

requests all appropriate plots.

FIT < (plot-options) >

requests a plot that displays the model fitting for a model that depends on at most one regressor, excluding the intercept. The plot is either a bubble plot or a heat map that is overlaid with the regression line and confidence band of the prediction.

The FIT plot request has the following plot-options:

NBINS=nbin1 <nbin2>

specifies the number of bins for the heat map of the observation weights in the fit plot. Thus, this option implies WEIGHT=HEATMAP by default. If you specify only one number, nbin1, then it is used for both the horizontal and vertical axes; if you specify two numbers, nbin1 and nbin2, then the first, nbin1, is used for the horizontal axis and the second, nbin2, is used for the vertical axis. If you do not specify this option, then by default the number of bins is determined by first using the algorithm that is discussed in the section ODS Graphics in Chapter 54: The KDE Procedure, and then multiplying the resulting numbers of bins by 3. If you request hexagonal bins by specifying SHAPE=HEXAGONAL, then the hexagonal bins have approximately the same area as the same number of rectangular bins would have.

WEIGHT=BUBBLE
WEIGHT=HEATMAP | HEAT

requests either a bubble plot or a heat map of the data as an overlay on the regression line and confidence limits band of the prediction in a fit plot . In a bubble plot, the bubble area is proportional to the weight of an observation. In a heat map, the heat color represents the sum of the weights at the corresponding location.

If you do not specify this option, the default plot depends on the number of observations in your data: For a data set that contains 100 observations or less, the default is a bubble plot. For a data set that contains more than 100 observations, the default is a heat map. If you specify either the NBINS= or the SHAPE= option, then WEIGHT=HEATMAP by default.

SHAPE=RECTANGULAR | REC
SHAPE=HEXAGONAL | HEX

requests either rectangular or hexagonal bins for a heat map of the data. Thus, this option implies WEIGHT=HEATMAP by default.

NONE

suppresses all plots.

RATE=value | SAS-data-set
R=value | SAS-data-set

specifies the sampling rate as a nonnegative value, or specifies an input data set that contains the stratum sampling rates. The procedure uses this information to compute a finite population correction for Taylor series variance estimation. The procedure does not use the RATE= option for BRR or jackknife variance estimation, which you request with the VARMETHOD=BRR or VARMETHOD=JACKKNIFE option.

If your sample design has multiple stages, you should specify the first-stage sampling rate, which is the ratio of the number of PSUs selected to the total number of PSUs in the population.

For a nonstratified sample design, or for a stratified sample design with the same sampling rate in all strata, you should specify a nonnegative value for the RATE= option. If your design is stratified with different sampling rates in the strata, then you should name a SAS data set that contains the stratification variables and the sampling rates. See the section Specification of Population Totals and Sampling Rates for more details.

The value in the RATE= option or the values of _RATE_ in the secondary data set must be nonnegative numbers. You can specify value as a number between 0 and 1. Or you can specify value in percentage form as a number between 1 and 100, and PROC SURVEYREG converts that number to a proportion. The procedure treats the value 1 as 100%, and not the percentage form 1%.

If you do not specify the TOTAL= or RATE= option, then the Taylor series variance estimation does not include a finite population correction. You cannot specify both the TOTAL= and RATE= options.

TOTAL=value | SAS-data-set
N=value | SAS-data-set

specifies the total number of primary sampling units in the study population as a positive value, or specifies an input data set that contains the stratum population totals. The procedure uses this information to compute a finite population correction for Taylor series variance estimation. The procedure does not use the TOTAL= option for BRR or jackknife variance estimation, which you request with the VARMETHOD=BRR or VARMETHOD=JACKKNIFE option.

For a nonstratified sample design, or for a stratified sample design with the same population total in all strata, you should specify a positive value for the TOTAL= option. If your sample design is stratified with different population totals in the strata, then you should name a SAS data set that contains the stratification variables and the population totals. See the section Specification of Population Totals and Sampling Rates for more details.

If you do not specify the TOTAL= or RATE= option, then the Taylor series variance estimation does not include a finite population correction. You cannot specify both the TOTAL= and RATE= options.

TRUNCATE

specifies that class levels should be determined using no more than the first 16 characters of the formatted values of the CLASS, STRATA, and CLUSTER variables. When formatted values are longer than 16 characters, you can use this option in order to revert to the levels as determined in releases before SAS 9.

VARMETHOD=BRR <(method-options)>
VARMETHOD=JACKKNIFE | JK <(method-options)>
VARMETHOD=TAYLOR

specifies the variance estimation method. VARMETHOD=TAYLOR requests the Taylor series method, which is the default if you do not specify the VARMETHOD= option or the REPWEIGHTS statement. VARMETHOD=BRR requests variance estimation by balanced repeated replication (BRR), and VARMETHOD=JACKKNIFE requests variance estimation by the delete-1 jackknife method.

For VARMETHOD=BRR and VARMETHOD=JACKKNIFE you can specify method-options in parentheses. Table 101.3 summarizes the available method-options.

Table 101.3: Variance Estimation Options

VARMETHOD=

Variance Estimation Method

Method-Options

BRR

Balanced repeated replication

FAY <=value>

   

HADAMARD=SAS-data-set

   

OUTWEIGHTS=SAS-data-set

   

PRINTH

   

REPS=number

JACKKNIFE

Jackknife

OUTJKCOEFS=SAS-data-set

   

OUTWEIGHTS=SAS-data-set

TAYLOR

Taylor series linearization

None


Method-options must be enclosed in parentheses following the method keyword. For example:

varmethod=BRR(reps=60 outweights=myReplicateWeights)

The following values are available for the VARMETHOD= option:

BRR <(method-options)>

requests balanced repeated replication (BRR) variance estimation. The BRR method requires a stratified sample design with two primary sampling units (PSUs) per stratum. See the section Balanced Repeated Replication (BRR) Method for more information.

You can specify the following method-options in parentheses following VARMETHOD=BRR:

FAY <=value>

requests Fay’s method , a modification of the BRR method, for variance estimation. See the section Fay’s BRR Method for more information.

You can specify the value of the Fay coefficient, which is used in converting the original sampling weights to replicate weights. The Fay coefficient must be a nonnegative number less than 1. By default, the value of the Fay coefficient equals 0.5.

HADAMARD=SAS-data-set
H=SAS-data-set

names a SAS data set that contains the Hadamard matrix for BRR replicate construction. If you do not provide a Hadamard matrix with the HADAMARD= method-option, PROC SURVEYREG generates an appropriate Hadamard matrix for replicate construction. See the sections Balanced Repeated Replication (BRR) Method and Hadamard Matrix for details.

If a Hadamard matrix of a given dimension exists, it is not necessarily unique. Therefore, if you want to use a specific Hadamard matrix, you must provide the matrix as a SAS data set in the HADAMARD= method-option.

In the HADAMARD= input data set, each variable corresponds to a column of the Hadamard matrix, and each observation corresponds to a row of the matrix. You can use any variable names in the HADAMARD= data set. All values in the data set must equal either 1 or –1. You must ensure that the matrix you provide is indeed a Hadamard matrix—that is, $\bA ’\bA = R\bI $, where $\bA $ is the Hadamard matrix of dimension R and $\bI $ is an identity matrix. PROC SURVEYREG does not check the validity of the Hadamard matrix that you provide.

The HADAMARD= input data set must contain at least H variables, where H denotes the number of first-stage strata in your design. If the data set contains more than H variables, the procedure uses only the first H variables. Similarly, the HADAMARD= input data set must contain at least H observations.

If you do not specify the REPS= method-option, then the number of replicates is taken to be the number of observations in the HADAMARD= input data set. If you specify the number of replicates—for example, REPS=nreps—then the first nreps observations in the HADAMARD= data set are used to construct the replicates.

You can specify the PRINTH option to display the Hadamard matrix that the procedure uses to construct replicates for BRR.

OUTWEIGHTS=SAS-data-set

names a SAS data set that contains replicate weights. See the section Balanced Repeated Replication (BRR) Method for information about replicate weights. See the section Replicate Weights Output Data Set for more details about the contents of the OUTWEIGHTS= data set.

The OUTWEIGHTS= method-option is not available when you provide replicate weights with the REPWEIGHTS statement.

PRINTH

displays the Hadamard matrix.

When you provide your own Hadamard matrix with the HADAMARD= method-option, only the rows and columns of the Hadamard matrix that are used by the procedure are displayed. See the sections Balanced Repeated Replication (BRR) Method and Hadamard Matrix for details.

The PRINTH method-option is not available when you provide replicate weights with the REPWEIGHTS statement because the procedure does not use a Hadamard matrix in this case.

REPS=number

specifies the number of replicates for BRR variance estimation. The value of number must be an integer greater than 1.

If you do not provide a Hadamard matrix with the HADAMARD= method-option, the number of replicates should be greater than the number of strata and should be a multiple of 4. See the section Balanced Repeated Replication (BRR) Method for more information. If a Hadamard matrix cannot be constructed for the REPS= value that you specify, the value is increased until a Hadamard matrix of that dimension can be constructed. Therefore, it is possible for the actual number of replicates used to be larger than the REPS= value that you specify.

If you provide a Hadamard matrix with the HADAMARD= method-option, the value of REPS= must not be less than the number of rows in the Hadamard matrix. If you provide a Hadamard matrix and do not specify the REPS= method-option, the number of replicates equals the number of rows in the Hadamard matrix.

If you do not specify the REPS= or HADAMARD= method-option and do not include a REPWEIGHTS statement, the number of replicates equals the smallest multiple of 4 that is greater than the number of strata.

If you provide replicate weights with the REPWEIGHTS statement, the procedure does not use the REPS= method-option. With a REPWEIGHTS statement, the number of replicates equals the number of REPWEIGHTS variables.

JACKKNIFE | JK <(method-options)>

requests variance estimation by the delete-1 jackknife method. See the section Jackknife Method for details. If you provide replicate weights with a REPWEIGHTS statement, VARMETHOD=JACKKNIFE is the default variance estimation method.

You can specify the following method-options in parentheses following VARMETHOD=JACKKNIFE:

OUTJKCOEFS=SAS-data-set

names a SAS data set that contains jackknife coefficients. See the section Jackknife Method for information about jackknife coefficients . See the section Jackknife Coefficients Output Data Set for more details about the contents of the OUTJKCOEFS= data set.

OUTWEIGHTS=SAS-data-set

names a SAS data set that contains replicate weights. See the section Jackknife Method for information about replicate weights. See the section Replicate Weights Output Data Set for more details about the contents of the OUTWEIGHTS= data set.

The OUTWEIGHTS= method-option is not available when you provide replicate weights with the REPWEIGHTS statement.

TAYLOR

requests Taylor series variance estimation. This is the default method if you do not specify the VARMETHOD= option or a REPWEIGHTS statement. See the section Taylor Series (Linearization) for more information.