Previous Page | Next Page

The SURVEYSELECT Procedure

Sample Output Data Set

Unless you specify the NOSAMPLE option in the STRATA statement, PROC SURVEYSELECT selects a sample and creates a SAS data set that contains the sample of selected units. If you specify the NOSAMPLE option, PROC SURVEYSELECT allocates the total sample size among the strata but does not select the sample. When you specify the NOSAMPLE option, the output data set contains the allocated sample sizes. See the section Allocation Output Data Set for details.

You can specify the name of the sample output data set in the OUT= option in the PROC SURVEYSELECT statement. If you omit the OUT= option, the data set is named DATAn, where n is the smallest integer that makes the name unique.

By default, the output data set contains one observation for each unit selected for the sample. But if you specify the OUTALL option, the output data set includes all observations from the input data set. With OUTALL, the output data set also contains a variable to indicate each observation’s selection status. The variable Selected equals 1 for an observation selected for the sample, and equals 0 for an observation not selected. The OUTALL option is available only for equal probability selection methods.

By default, the output data set contains one observation for each selected unit, even if the unit is selected more than once, and the variable NumberHits contains the number of hits or selections for that unit. A unit might be selected more than once if you use a with-replacement or with-minimum-replacement selection method (METHOD=URS, METHOD=PPS_WR, METHOD=PPS_SYS, or METHOD=PPS_SEQ). If you specify the OUTHITS option, the output data set contains a separate observation for each hit or selection.

The output data set contains design information and selection statistics, depending on the selection method and output options you specify. The output data set can include the following variables:

  • Selected, which indicates whether or not the observation is selected for the sample. This variable is included if you specify the OUTALL option. Selected equals 1 for an observation selected for the sample or 0 for an observation not selected.

  • STRATA variables, which you specify in the STRATA statement

  • Replicate, which is the sample replicate number. This variable is included when you request replicated sampling with the REPS= option.

  • ID variables, which you name in the ID statement

  • CONTROL variables, which you specify in the CONTROL statement

  • Zone, which is the selection zone. This variable is included for METHOD=PPS_SEQ.

  • SIZE variable, which you specify in the SIZE statement

  • AdjustedSize, which is the adjusted size measure. This variable is included if you request adjusted sizes with the MINSIZE= or MAXSIZE= option.

  • Certain, which indicates certainty selection. This variable is included if you specify the CERTSIZE= or CERTSIZE=P= option. Certain equals 1 for units included with certainty because their size measures exceed the certainty size value or the certainty proportion; otherwise, Certain equals 0.

  • NumberHits, which is the number of hits or selections. This variable is included for selection methods that are with replacement or with minimum replacement (METHOD=URS, METHOD=PPS_WR, METHOD=PPS_SYS, and METHOD=PPS_SEQ).

The output data set includes the following variables if you request a PPS selection method or if you specify the STATS option for other methods:

  • ExpectedHits, which is the expected number of hits or selections. This variable is included for selection methods that are with replacement or with minimum replacement, and so might select the same unit more than once (METHOD=URS, METHOD=PPS_WR, METHOD=PPS_SYS, and METHOD=PPS_SEQ).

  • SelectionProb, which is the probability of selection. This variable is included for selection methods that are without replacement.

  • SamplingWeight, which is the sampling weight. This variable equals the inverse of ExpectedHits or SelectionProb.

For METHOD=PPS_BREWER and METHOD=PPS_MURTHY, which select two units from each stratum with probability proportional to size, the output data set contains the following variable:

  • JtSelectionProb, which is the joint probability of selection for the two units selected from the stratum

If you specify the JTPROBS option to compute joint probabilities of selection for METHOD=PPS or METHOD=PPS_SAMPFORD, then the output data set contains the following variables:

  • Unit, which is an identification variable that numbers the selected units sequentially within each stratum

  • JtProb_1, JtProb_2, JtProb_3, ..., where the variable JtProb_1 contains the joint probability of selection for the current unit and unit 1. Similarly, JtProb_2 contains the joint probability of selection for the current unit and unit 2, and so on.

If you specify the JTPROBS option for METHOD=PPS_WR, then the output data set contains the following variables:

  • Unit, which is an identification variable that numbers the selected units sequentially within each stratum

  • JtHits_1, JtHits_2, JtHits_3, ..., where the variable JtHits_1 contains the joint expected number of hits for the current unit and unit 1. Similarly, JtHits_2 contains the joint expected number of hits for the current unit and unit 2, and so on.

If you specify the OUTSIZE option, the output data set contains the following variables. If you specify a STRATA statement, the output data set includes stratum-level values of these variables. Otherwise, the output data set contains population-level values of these variables.

  • MinimumSize, which is the minimum size measure specified with the MINSIZE= option. This variable is included if you specify the MINSIZE= option.

  • MaximumSize, which is the maximum size measure specified with the MAXSIZE= option. This variable is included if you specify the MAXSIZE= option.

  • CertaintySize, which is the certainty size measure specified with the CERTSIZE= option. This variable is included if you specify the CERTSIZE= option.

  • CertaintyProp, which is the certainty proportion specified with the CERTSIZE=P= option. This variable is included if you specify the CERTSIZE=P= option.

  • Total, which is the total number of sampling units in the stratum. This variable is included if there is no SIZE statement.

  • TotalSize, which is the total of size measures in the stratum. This variable is included if there is a SIZE statement.

  • TotalAdjSize, which is the total of adjusted size measures in the stratum. This variable is included if you specify a SIZE statement and if you request adjusted sizes with the MAXSIZE= or MINSIZE= option.

  • SamplingRate, which is the sampling rate. This variable is included if you specify the SAMPRATE= option.

  • SampleSize, which is the sample size. This variable is included if you specify the SAMPSIZE= option, or if you specify METHOD=PPS_BREWER or METHOD=PPS_MURTHY, which selects two units from each stratum.


If you specify the OUTSEED option, the output data set contains the following variable:

  • InitialSeed, which is the initial seed for the stratum.

If you specify the ALLOC= option in the STRATA statement, the output data set contains the following variables:

  • Total, which is the total number of sampling units in the stratum

  • Variance, which is the stratum variance. This variable is included if you specify the VAR, VAR=(values), or VAR=SAS-data-set option for ALLOC=OPTIMAL or ALLOC=NEYMAN.

  • Cost, which is the stratum cost. This variable is included if you specify the COST, COST=(values), or COST=SAS-data-set option for ALLOC=OPTIMAL.

  • AllocProportion, which is the target allocation proportion, or the proportion of the total sample size to allocate to the stratum. PROC SURVEYSELECT computes this proportion by using the specified allocation method.

  • SampleSize, which is the sample size allocated to the stratum

  • ActualProportion, which is the actual proportion allocated to the stratum. The value of ActualProportion equals the allocated stratum sample size divided by the total sample size. This value can differ from the target AllocProportion due to rounding and other restrictions. See the section Sample Size Allocation for details.

Previous Page | Next Page | Top of Page