Previous Page | Next Page

The SURVEYSELECT Procedure

STRATA Statement

STRATA variables </ options> ;

You can specify a STRATA statement to partition the input data set into nonoverlapping groups defined by the STRATA variables. PROC SURVEYSELECT then selects independent samples from these strata, according to the selection method and design parameters specified in the PROC SURVEYSELECT statement. For information about the use of stratification in sample design, see Lohr (1999), Kalton (1983), Kish (1965, 1987), and Cochran (1977).

The variables are one or more variables in the input data set. The STRATA variables function much like BY variables, and PROC SURVEYSELECT expects the input data set to be sorted in order of the STRATA variables.

If you specify a CONTROL statement, or if you specify METHOD=PPS, the input data set must be sorted in ascending order of the STRATA variables. This means you cannot use the STRATA option NOTSORTED or DESCENDING when you specify a CONTROL statement or METHOD=PPS.

If your input data set is not sorted by the STRATA variables in ascending order, use one of the following alternatives:

  • Sort the data by using the SORT procedure with the STRATA variables in a BY statement.

  • Specify the option NOTSORTED or DESCENDING in the STRATA statement for the SURVEYSELECT procedure (when you do not specify a CONTROL statement or METHOD=PPS). The NOTSORTED option does not mean that the data are unsorted but rather that the data are arranged in groups (according to values of the STRATA variables) and that these groups are not necessarily in alphabetical or increasing numeric order.

  • Create an index on the STRATA variables by using the DATASETS procedure.

For more information about the BY statement, see SAS Language Reference: Concepts. For more information about the DATASETS procedure, see the Base SAS Procedures Guide.

Allocation Options

The STRATA options request allocation of the total sample size among the strata. You can specify the total sample size in the SAMPSIZE= option in the PROC SURVEYSELECT statement. When you request allocation with the ALLOC= option in the STRATA statement, PROC SURVEYSELECT allocates the total sample size among the strata according to the allocation method you name. You can request proportional allocation (ALLOC=PROP), optimal allocation (ALLOC=OPTIMAL), or Neyman allocation (ALLOC=NEYMAN). See the section Sample Size Allocation for details about these methods.

Instead of requesting that PROC SURVEYSELECT compute the sample allocation, you can directly specify the allocation proportions by using the ALLOC=(values) option or the ALLOC=SAS-data-set option. Then PROC SURVEYSELECT allocates the total sample size among the strata according to the proportions you specify.

By default, PROC SURVEYSELECT computes the allocation of the total sample size among the strata and then selects the sample by using the allocated sample sizes. If you specify the NOSAMPLE option, PROC SURVEYSELECT computes the allocation but does not select the sample. In this case the OUT= output data set contains the stratum sample sizes computed according to the specified allocation method. See the section Allocation Output Data Set for details.

You can specify the following options in the STRATA statement.

ALLOC=name

specifies the method for allocating the total sample size among the strata. The following values for name are available:

PROPORTIONAL
PROP

requests proportional allocation, which allocates the total sample size in proportion to the stratum sizes, where the stratum size is the number of sampling units in the stratum. See the section Sample Size Allocation for details.

OPTIMAL
OPT

requests optimal allocation, which allocates the total sample size among the strata in proportion to stratum sizes, stratum variances, and stratum costs. See the section Sample Size Allocation for more information. If you specify ALLOC=OPTIMAL, you must provide the stratum variances with the VAR, VAR=(values), or VAR=SAS-data-set option. You must provide the stratum costs with the COST, COST=(values), or COST=SAS-data-set option.

NEYMAN

requests Neyman allocation, which allocates the total sample size among the strata in proportion to the stratum sizes and variances. See the section Sample Size Allocation for more information. If you specify ALLOC=NEYMAN, you must provide the stratum variances with the VAR, VAR=(values), or VAR=SAS-data-set option.

ALLOC=(values)

lists stratum allocation proportions. You can separate values with blanks or commas.

Each allocation proportion specifies the percent of the total sample size to allocate to the corresponding stratum. The number of ALLOC= values must equal the number of strata in the input data set. The sum of the allocation proportions must equal 1.

Each allocation proportion must be a positive number. You can specify each value as a number between 0 and 1. Or you can specify a value in percentage form as a number between 1 and 100, and PROC SURVEYSELECT converts that number to a proportion. The procedure treats the value 1 as 100%, and not the percentage form 1%.

List the allocation proportions in the order in which the strata appear in the input data set. If you use the ALLOC=(values) option, the input data set must be sorted by the STRATA variables in ascending order. You cannot use the DESCENDING or NOTSORTED option in the STRATA statement.

ALLOC=SAS-data-set

names a SAS data set that contains stratum allocation proportions. You provide the stratum allocation proportions in the ALLOC= data set variable _ALLOC_.

Each allocation proportion specifies the percent of the total sample size to allocate to the corresponding stratum. The sum of the allocation proportions must equal 1.

Each allocation proportion must be a positive number. You can specify the value as a number between 0 and 1. Or you can specify the value in percentage form as a number between 1 and 100, and PROC SURVEYSELECT converts that number to a proportion. The procedure treats the value 1 as 100%, and not the percentage form 1%.

The ALLOC= data set should contain all the STRATA variables, with the same type and length as in the DATA= input data set. The STRATA groups should appear in the same order in the ALLOC= data set as in the DATA= data set. The ALLOC= data set is a secondary input data set. See the section Secondary Input Data Set for details. You can name only one secondary data set in each invocation of the procedure.

COST

indicates that stratum costs are included in the secondary input data set. Use the COST option when you have already named the secondary input data set in another option, such as the VAR=SAS-data-set option. You provide the stratum costs in the secondary input data set variable _COST_.

A stratum cost represents the per-unit cost, or the survey cost of a single unit in the stratum. Each stratum cost must be a positive number. Cost values are required if you specify the ALLOC=OPTIMAL option.

COST=(values)

specifies stratum costs, which are required if you specify the ALLOC=OPTIMAL option. You can separate values with blanks or commas.

A stratum cost represents the per-unit cost, or the survey cost of a single unit in the stratum. Each stratum cost must be a positive number.

The number of COST= values must equal the number of strata in the input data set. List the stratum costs in the order in which the strata appear in the input data set. If you use the COST=values option, the input data set must be sorted by the STRATA variables in ascending order. You cannot use the DESCENDING or NOTSORTED option in the STRATA statement.

COST=SAS-data-set

names a SAS data set that contains the stratum costs. You provide the stratum costs in the COST= data set variable _COST_.

A stratum cost represents the per-unit cost, or the survey cost of a single unit in the stratum. Each stratum cost must be a positive number. Stratum costs are required if you specify the ALLOC=OPTIMAL option.

The COST= data set should contain all the STRATA variables, with the same type and length as in the DATA= input data set. The STRATA groups should appear in the same order in the COST= data set as in the DATA= data set. The COST= data set is a secondary input data set. See the section Secondary Input Data Set for details. You can name only one secondary input data set in each invocation of the procedure.

NOSAMPLE

requests that SURVEYSELECT allocate the total sample size among the strata but not select the sample. When you specify the NOSAMPLE option, the OUT= output data set contains the stratum sample sizes that PROC SURVEYSELECT computes. See the section Allocation Output Data Set for details.

VAR

indicates that stratum variances are included in the secondary input data set. Use the VAR option when you have already named the secondary input data set in another option, such as the COST=SAS-data-set option. You provide the stratum variances in the secondary input data set variable _VAR_.

Each stratum variance must be a positive number. Stratum variances are required if you specify ALLOC=OPTIMAL or ALLOC=NEYMAN.

VAR=(values)

lists stratum variances, which are required if you specify ALLOC=OPTIMAL or ALLOC=NEYMAN. You can separate values with blanks or commas.

Each stratum variance must be a positive number. The number of VAR= values must equal the number of strata in the input data set. List the stratum variances in the order in which the strata appear in the input data set. If you use the VAR=(values) option, the input data set must be sorted by the STRATA variables in ascending order. You cannot use the DESCENDING or NOTSORTED option in the STRATA statement.

VAR=SAS-data-set

names a SAS data set that contains the stratum variances. You provide the stratum variances in the VAR= data set variable _VAR_.

Each stratum variance must be a positive number. Stratum variances are required if you specify ALLOC=OPTIMAL or ALLOC=NEYMAN.

The VAR= data set should contain all the STRATA variables, with the same type and length as in the DATA= input data set. The STRATA groups should appear in the same order in the VAR= data set as in the DATA= data set. The VAR= data set is a secondary input data set. See the section Secondary Input Data Set for details. You can name only one secondary input data set in each invocation of the procedure.

Previous Page | Next Page | Top of Page