The SURVEYSELECT Procedure

Overview: SURVEYSELECT Procedure

The SURVEYSELECT procedure provides a variety of methods for selecting probability-based random samples. The procedure can select a simple random sample or can sample according to a complex multistage design that includes stratification, clustering, and unequal probabilities of selection. When you use probability sampling, each unit in the survey population has a known, positive probability of selection. This property of probability sampling avoids selection bias and enables you to use statistical theory to make valid inferences from the sample to the survey population.

To select a sample by using PROC SURVEYSELECT, you provide a SAS data set that contains the sampling frame (the list of units from which the sample is to be selected). The sampling units can be individual observations or groups of observations (clusters). You can also specify the selection method, the sample size or sampling rate, and other selection parameters. PROC SURVEYSELECT selects the sample and produces an output data set that contains the selected units, their selection probabilities, and their sampling weights. To select a sample in multiple stages, you can invoke the procedure separately for each stage of selection by providing the sampling frame and selection parameters for each stage.

PROC SURVEYSELECT provides methods for both equal probability sampling and probability proportional to size (PPS) sampling. In equal probability sampling, each unit in the sampling frame (or stratum) has the same probability of selection. In PPS sampling, each unit’s selection probability is proportional to its size measure. For information about probability sampling methods, see Lohr (2010), Kish (1965), Kish (1987), Kalton (1983), and Cochran (1977).

PROC SURVEYSELECT provides the following equal probability sampling methods:

  • simple random sampling (without replacement)

  • unrestricted random sampling (with replacement)

  • systematic random sampling

  • sequential random sampling

  • Bernoulli sampling

The procedure also provides Poisson sampling and the following probability proportional to size (PPS) sampling methods:

  • PPS sampling without replacement

  • PPS sampling with replacement

  • PPS systematic sampling

  • PPS algorithms for selecting two units per stratum

  • sequential PPS sampling with minimum replacement

PROC SURVEYSELECT uses fast, efficient algorithms for sample selection. Thus, it performs well even for large input data sets (sampling frames).

PROC SURVEYSELECT can perform stratified sampling by selecting samples independently within strata, which are nonoverlapping subgroups of the survey population. Stratification controls the distribution of the sample size in the strata. It is widely used in practice toward meeting a variety of survey objectives. For example, you can use stratification to ensure adequate sample sizes for subgroups of interest (including small subgroups), or you can use stratification to improve the precision of overall estimates. When you use a systematic or sequential selection method, PROC SURVEYSELECT can sort by control variables within strata for the additional control of implicit stratification.

For stratified sampling, PROC SURVEYSELECT provides survey design methods to allocate the total sample size among the strata. Available allocation methods include proportional, Neyman, and optimal allocation. Optimal allocation maximizes the estimation precision within the available resources by taking into account stratum sizes, costs, and variances.

PROC SURVEYSELECT provides replicated sampling, where the total sample is composed of a set of replicates, and each replicate is selected in the same way. You can use replicated sampling to study variable nonsampling errors, such as variability in the results obtained by different interviewers. You can also use replicated sampling to estimate standard errors for combined sample estimates and to perform a variety of other resampling and simulation tasks.