The SURVEYSELECT Procedure

Secondary Input Data Set

The primary input data set for PROC SURVEYSELECT is the DATA= data set, which contains the list of units from which the sample is selected. You can use a secondary input data set to provide stratum-level design and selection information, such as sample sizes or rates, certainty size values, or stratum costs. This secondary input data set is sometimes called the SAMPSIZE= input data set. You can provide stratum sample sizes in the _NSIZE_ (or SampleSize) variable in the SAMPSIZE= data set.

The secondary input data set must contain all the STRATA variables, with the same type and length as in the DATA= data set. The STRATA groups should appear in the same order in the secondary data set as in the DATA= data set. You can name only one secondary data set in each invocation of PROC SURVEYSELECT.

You must name the secondary input data set in the appropriate PROC SURVEYSELECT or STRATA option, and use the designated variable name to provide the stratum-level values. For example, if you want to provide stratum-level costs for sample allocation, you name the secondary data set in the COST=SAS-data-set option in the STRATA statement. The data set must include the stratum costs in a variable named _COST_. You can use the secondary input data set for more than one option if it is appropriate for your design. For example, the secondary data set can include both stratum costs and stratum variances, which are required for optimal allocation (ALLOC=OPTIMAL ).

Instead of using a separate secondary input data set, you can include secondary information in the DATA= data set along with the sampling frame. When you include secondary information in the DATA= data set, name the DATA= data set in the appropriate options, and include the required variables in the DATA= data set.

Table 115.3 lists the available secondary data set variables, together with their descriptions and the corresponding options.

Table 115.3: PROC SURVEYSELECT Secondary Data Set Variables

Variable

Description

Statement

Option

_ALLOC_

Allocation proportion

STRATA

ALLOC=

_CERTP_

Certainty proportion

PROC

CERTSIZE=P=

_CERTSIZE_

Certainty size

PROC

CERTSIZE=

_COST_

Cost

STRATA

COST=

_MAXSIZE_

Maximum size

PROC

MAXSIZE=

_MINSIZE_

Minimum size

PROC

MINSIZE=

_NSIZE_

Sample size

PROC

SAMPSIZE=

_RATE_

Sampling rate

PROC

SAMPRATE=

_SEED_

Random number seed

PROC

SEED=

_VAR_

Variance

STRATA

VAR=