The SURVEYFREQ Procedure

PROC SURVEYFREQ Statement

  • PROC SURVEYFREQ < options > ;

The PROC SURVEYFREQ statement invokes the SURVEYFREQ procedure. It also identifies the data set to be analyzed, specifies the variance estimation method to use, and provides sample design information. The DATA= option names the input data set to be analyzed. The VARMETHOD= option specifies the variance estimation method, which is the Taylor series method by default. For Taylor series variance estimation, you can include a finite population correction factor in the analysis by providing either the sampling rate or population total in the RATE= or TOTAL= option, respectively. If your design is stratified with different sampling rates or totals for different strata, you can input these stratum rates or totals in a SAS data set that contains the stratification variables.

Table 109.1 summarizes the options available in the PROC SURVEYFREQ statement.

Table 109.1: PROC SURVEYFREQ Statement Options

Option

Description

DATA=

Names the input SAS data set

MISSING

Treats missing values as a valid level

NOMCAR

Treats missing values as not missing completely at random

NOSUMMARY

Suppresses the display of the "Data Summary" table

ORDER=

Specifies the order of variable levels

PAGE

Displays only one table per page

RATE=

Specifies the first-stage sampling rate

TOTAL=

Specifies the total number of primary sampling units

VARHEADER=

Specifies the variable identification to display

VARMETHOD=

Specifies the variance estimation method


You can specify the following options in the PROC SURVEYFREQ statement:

DATA=SAS-data-set

names the SAS-data-set to be analyzed by PROC SURVEYFREQ. If you omit the DATA= option, the procedure uses the most recently created SAS data set.

MISSING

treats missing values as a valid (nonmissing) category for all categorical variables, which include TABLES , STRATA , and CLUSTER variables.

By default (if you do not specify the MISSING option), PROC SURVEYFREQ excludes an observation from the analysis if the observation contains a missing value for any STRATA or CLUSTER variable. By default, PROC SURVEYFREQ also excludes an observation from the analysis if the observation contains a missing value for any variable in the table request. For more information, see the section Missing Values.

NOMCAR

includes observations with missing values of TABLES variables in the variance computation as not missing completely at random (NOMCAR) for Taylor series variance estimation. When you specify the NOMCAR option, PROC SURVEYFREQ computes variance estimates by analyzing the nonmissing values as a domain (subpopulation), where the entire population includes both nonmissing and missing domains. For more information, see the section Missing Values.

By default, PROC SURVEYFREQ completely excludes an observation from a frequency or crosstabulation table (and the corresponding variance computations) if that observation has a missing value for any of the variables in the table request, unless you specify the MISSING option. The NOMCAR option has no effect when you specify the MISSING option, which treats missing values as a valid nonmissing level.

The NOMCAR option applies only to Taylor series variance estimation. The replication methods, which you can request by specifying the VARMETHOD=BRR and VARMETHOD=JACKKNIFE options, do not use the NOMCAR option.

NOSUMMARY

suppresses the display of the "Data Summary" table, which PROC SURVEYFREQ produces by default. For information about this table, see the section Data Summary Table.

ORDER=DATA | FORMATTED | FREQ | INTERNAL

specifies the order of the variable levels in the frequency and crosstabulation tables, which you request in the TABLES statement. The ORDER= option also controls the order of the STRATA variable levels in the "Stratum Information" table.

The ORDER= option can take the following values:

ORDER=

Levels Ordered By

DATA

Order of appearance in the input data set

FORMATTED

External formatted value, except for numeric variables with no explicit format, which are sorted by their unformatted (internal) value

FREQ

Descending frequency count; levels with the most observations come first in the order

INTERNAL

Unformatted value

By default, ORDER=INTERNAL. The FORMATTED and INTERNAL orders are machine-dependent. The frequency count used by ORDER=FREQ is the nonweighted frequency (sample size), rather than the weighted frequency.

For more information about sort order, see the chapter on the SORT procedure in the Base SAS Procedures Guide and the discussion of BY-group processing in SAS Language Reference: Concepts.

PAGE

displays only one table per page. Otherwise, PROC SURVEYFREQ displays multiple tables per page as space permits.

RATE=value | SAS-data-set
R=value | SAS-data-set

specifies the sampling rate, which PROC SURVEYFREQ uses to compute a finite population correction for Taylor series variance estimation. You can provide a single sampling rate value, or you can provide stratum sampling rates by specifying a SAS-data-set.

If your sample design has multiple stages, you should specify the first-stage sampling rate, which is the ratio of the number of primary sampling units (PSUs) in the sample to the total number of PSUs in the population.

For a nonstratified sample design, or for a stratified sample design that uses the same sampling rate in all strata, you should specify a single sampling rate value. If your design is stratified and uses different sampling rates in different strata, you should name a SAS-data-set that contains the stratification variables and the stratum sampling rates. You should provide the stratum sampling rates in the data set variable named _RATE_. For more information, see the section Population Totals and Sampling Rates.

The sampling rate values must be nonnegative numbers. You can specify sampling rates as numbers between 0 and 1. Or you can specify sampling rates in percentage form as numbers between 1 and 100, which PROC SURVEYFREQ converts to proportions. The procedure treats the value 1 as 100% instead of 1%.

If you do not specify the RATE= or the TOTAL= option, the Taylor series variance estimation does not include a finite population correction. You cannot specify both the RATE= and the TOTAL= option in the same PROC SURVEYFREQ statement.

PROC SURVEYSELECT does not use the RATE= or the TOTAL= option for BRR or jackknife variance estimation (which you can request by specifying the VARMETHOD=BRR or VARMETHOD=JACKKNIFE option, respectively).

TOTAL=value | SAS-data-set
N=value | SAS-data-set

specifies the total number of primary sampling units (PSUs), which PROC SURVEYFREQ uses to compute a finite population correction for Taylor series variance estimation. You can provide a single total value, or you can provide stratum totals by specifying a SAS-data-set. The totals must be positive numbers.

If your sample design has multiple stages, you should specify the total number of primary sampling units (PSUs).

For a nonstratified sample design, you should specify a single total value, which refers to the total number of PSUs in the population. For a stratified sample design that has the same population total in each stratum, you can specify a single total value, which refers to the total number of PSUs in each stratum. If your design is stratified and has different totals in different strata, you should name a SAS-data-set that contains the stratification variables and the stratum totals. You should provide the stratum totals in the data set variable named _TOTAL_. For more information, see the section Population Totals and Sampling Rates.

If you do not specify the RATE= or the TOTAL= option, the Taylor series variance estimation does not include a finite population correction. You cannot specify both the RATE= and the TOTAL= option in the same PROC SURVEYFREQ statement.

PROC SURVEYSELECT does not use the RATE= or the TOTAL= option for BRR or jackknife variance estimation (which you can request by specifying the VARMETHOD=BRR or VARMETHOD=JACKKNIFE option, respectively).

VARHEADER=LABEL | NAME | NAMELABEL

specifies the variable identification to use in the displayed output. This option controls the headings of the variable columns in one-way frequency tables, crosstabulation tables, and the "Stratum Information" table. This option also controls the variable identification in table titles. By default, VARHEADER=NAME.

The VARHEADER= option can take the following values:

VARHEADER=

Variable Identification Displayed

LABEL

Variable label

NAME

Variable name

NAMELABEL

Variable name and label, as Name (Label)

VARMETHOD=BRR < (method-options)>
VARMETHOD=JACKKNIFE | JK < (method-options)>
VARMETHOD=TAYLOR

specifies the variance estimation method. VARMETHOD=TAYLOR requests the Taylor series method, which is the default if you do not specify the VARMETHOD= option or the REPWEIGHTS statement. VARMETHOD=BRR requests variance estimation by balanced repeated replication (BRR), and VARMETHOD=JACKKNIFE requests variance estimation by the delete-1 jackknife method.

For VARMETHOD=BRR and VARMETHOD=JACKKNIFE, you can specify method-options in parentheses after the variance method name. For example:

varmethod=BRR(reps=60 outweights=myReplicateWeights)

Table 109.2 summarizes the available method-options.

Table 109.2: Variance Estimation Options

VARMETHOD=

Variance Estimation Method

Method Options

BRR

Balanced repeated replication

DFADJ

   

FAY <=value>

   

HADAMARD=SAS-data-set

   

OUTWEIGHTS=SAS-data-set

   

PRINTH

   

REPS=number

JACKKNIFE | JK

Jackknife

DFADJ

   

OUTJKCOEFS=SAS-data-set

   

OUTWEIGHTS=SAS-data-set

TAYLOR

Taylor series linearization

None


You can specify the following values for the VARMETHOD= option:

BRR < (method-options)>

requests variance estimation by balanced repeated replication (BRR). The BRR method requires a stratified sample design that has two primary sampling units (PSUs) in each stratum. If you specify this option, you must also specify a STRATA statement unless you use a REPWEIGHTS statement to provide replicate weights. For more information, see the section Balanced Repeated Replication (BRR).

You can specify the following method-options:

DFADJ

computes the degrees of freedom as the number of nonmissing strata for the individual table request. If you specify this option, PROC SURVEYFREQ does not count any empty strata that occur when observations that have missing values of the TABLES variables are removed from the analysis of the table. By default, PROC SURVEYFREQ computes the degrees of freedom by counting the number of nonmissing strata for all valid observations in the input data set.

For more information, see the section Degrees of Freedom. For information about valid observations, see the section Data Summary Table.

This method-option has no effect when you specify the MISSING option, which treats missing values as a valid nonmissing level.

This method-option is not used when you specify the degrees of freedom in the DF= option in the TABLES statement or when you specify a REPWEIGHTS statement to provide replicate weights. When you specify a REPWEIGHTS statement, the degrees of freedom are the number of REPWEIGHTS variables (replicates) unless you specify the DF= option in the REPWEIGHTS or the TABLES statement.

FAY < =value >

requests Fay’s method, which is a modification of the BRR method. For more information, see the section Fay’s BRR Method.

You can specify the value of the Fay coefficient, which is used in converting the original sampling weights to replicate weights. The Fay coefficient must be a nonnegative number less than 1. By default, the Fay coefficient is 0.5.

HADAMARD=SAS-data-set
H=SAS-data-set

names a SAS-data-set that contains the Hadamard matrix for BRR replicate construction. If you do not specify this method-option, PROC SURVEYFREQ generates an appropriate Hadamard matrix for replicate construction. For more information, see the sections Balanced Repeated Replication (BRR) and Hadamard Matrix.

If a Hadamard matrix of a particular dimension exists, it is not necessarily unique. Therefore, if you want to use a specific Hadamard matrix, you must provide the matrix as a SAS-data-set in this method-option.

In the HADAMARD= input data set, each variable corresponds to a column and each observation corresponds to a row of the Hadamard matrix. You can use any variable names in the HADAMARD= data set. All values in the data set must equal either 1 or –1. You must ensure that the matrix you provide is indeed a Hadamard matrix—that is, $\bA ’\bA = R\bI $, where $\bA $ is the Hadamard matrix of dimension R and $\bI $ is an identity matrix. PROC SURVEYFREQ does not check the validity of the Hadamard matrix that you provide.

The HADAMARD= input data set must contain at least H variables, where H denotes the number of first-stage strata in your design. If the data set contains more than H variables, PROC SURVEYFREQ uses only the first H variables. Similarly, the HADAMARD= input data set must contain at least H observations.

If you do not specify the REPS= method-option, the number of replicates is assumed to be the number of observations in the HADAMARD= input data set. If you specify the number of replicates—for example, REPS=nreps—the first nreps observations in the HADAMARD= data set are used to construct the replicates.

You can specify the PRINTH method-option to display the Hadamard matrix that PROC SURVEYFREQ uses to construct replicates for BRR.

OUTWEIGHTS=SAS-data-set

names a SAS-data-set to store the replicate weights that PROC SURVEYFREQ creates for BRR variance estimation. For information about replicate weights, see the section Balanced Repeated Replication (BRR). For information about the contents of the OUTWEIGHTS= data set, see the section Replicate Weight Output Data Set.

The OUTWEIGHTS= method-option is not available when you provide replicate weights in a REPWEIGHTS statement.

PRINTH

displays the Hadamard matrix that PROC SURVEYFREQ uses to construct replicates for BRR variance estimation. When you provide the Hadamard matrix in the HADAMARD= method-option, PROC SURVEYFREQ displays only the rows and columns that are actually used to construct replicates. For more information, see the sections Balanced Repeated Replication (BRR) and Hadamard Matrix.

The PRINTH method-option is not available when you provide replicate weights in a REPWEIGHTS statement because the procedure does not use a Hadamard matrix in this case.

REPS=number

specifies the number of replicates for BRR variance estimation. The value of number must be an integer greater than 1.

If you do not use the HADAMARD= method-option to provide a Hadamard matrix, the number of replicates should be greater than the number of strata and should be a multiple of 4. For more information, see the section Balanced Repeated Replication (BRR). If PROC SURVEYFREQ cannot construct a Hadamard matrix for the REPS= value that you specify, the value is increased until a Hadamard matrix of that dimension can be constructed. Therefore, the actual number of replicates that PROC SURVEYFREQ uses might be larger than number.

If you use the HADAMARD= method-option to provide a Hadamard matrix, the value of number must not be greater than the number of rows in the Hadamard matrix. If you provide a Hadamard matrix and do not specify the REPS= method-option, the number of replicates is the number of rows in the Hadamard matrix.

If you do not specify the REPS= or the HADAMARD= method-option and do not use a REPWEIGHTS statement, the number of replicates is the smallest multiple of 4 that is greater than the number of strata.

If you use a REPWEIGHTS statement to provide replicate weights, PROC SURVEYFREQ does not use the REPS= method-option; the number of replicates is the number of REPWEIGHTS variables.

JACKKNIFE < (method-options)>
JK < (method-options)>

requests variance estimation by the delete-1 jackknife method. For more information, see the section The Jackknife Method. If you use a REPWEIGHTS statement to provide replicate weights, VARMETHOD=JACKKNIFE is the default variance estimation method.

The delete-1 jackknife method requires at least two primary sampling units (PSUs) in each stratum for stratified designs unless you use a REPWEIGHTS statement to provide replicate weights.

You can specify the following method-options:

DFADJ

computes the degrees of freedom by using the number of nonmissing strata and clusters for the individual table request. If you specify this method-option, PROC SURVEYFREQ does not count any empty strata or clusters that occur when observations that have missing values of the TABLES variables are removed from the analysis of the table. By default, PROC SURVEYFREQ computes the degrees of freedom by counting the number of nonmissing strata and clusters for all valid observations in the input data set. The degrees of freedom for VARMETHOD=JACKKNIFE equal the number of clusters minus the number of strata.

For more information, see the section Degrees of Freedom. For information about valid observations, see the section Data Summary Table.

This method-option has no effect when you specify the MISSING option, which treats missing values as a valid nonmissing level.

This method-option is not used when you specify the degrees of freedom in the DF= option in the TABLES statement or when you specify a REPWEIGHTS statement to provide replicate weights. When you specify a REPWEIGHTS statement, the degrees of freedom are the number of REPWEIGHTS variables (replicates) unless you specify the DF= option in the REPWEIGHTS or the TABLES statement.

OUTJKCOEFS=SAS-data-set

names a SAS-data-set to store the jackknife coefficients. For information about jackknife coefficients, see the section The Jackknife Method. For information about the contents of the OUTJKCOEFS= data set, see the section Jackknife Coefficient Output Data Set.

OUTWEIGHTS=SAS-data-set

names a SAS-data-set to store the replicate weights that PROC SURVEYFREQ creates for jackknife variance estimation. For information about replicate weights, see the section The Jackknife Method. For information about the contents of the OUTWEIGHTS= data set, see the section Replicate Weight Output Data Set.

This method-option is not available when you use a REPWEIGHTS statement to provide replicate weights.

TAYLOR

requests Taylor series variance estimation. This is the default method if you do not specify the VARMETHOD= option or a REPWEIGHTS statement. For more information, see the section Taylor Series Variance Estimation.