The SURVEYSELECT Procedure

Systematic Random Sampling

Systematic random sampling (METHOD=SYS ) selects units at a fixed interval throughout the sampling frame (or stratum) after a random start. If you request stratified sampling by specifying a STRATA statement, PROC SURVEYSELECT independently selects systematic samples from the strata. PROC SURVEYSELECT applies systematic selection to sampling units in the order of their appearance in the input data set, or in their sorted order if you specify a CONTROL statement.

This section describes equal-probability systematic sampling, where each sampling unit in the sampling frame (or stratum) has the same probability of selection. For information about PPS systematic sampling, see the section PPS Systematic Sampling.

When you specify the sample size in the SAMPSIZE= option, PROC SURVEYSELECT computes the systematic selection interval as the ratio of the total number of sampling units to the sample size ($N/n$, or $N_ h/n_ h$ for stratified sampling). The procedure uses a fractional systematic interval to provide the specified sample size exactly. The selection probability for each unit is computed as $n/N$ (or $n_ h/N_ h$ for stratified sampling).

When you specify the sampling rate in the SAMPRATE= option, PROC SURVEYSELECT computes the systematic selection interval as the inverse of the sampling rate. The selection probability for each unit is the sampling rate.

Instead of specifying the sample size or sampling rate, you can directly specify the systematic interval in the INTERVAL= option. When you specify the interval, PROC SURVEYSELECT computes the selection probability as the inverse of the interval value.

By default, PROC SURVEYSELECT randomly determines a starting value in the selection interval. Optionally, you can specify the starting value in the START= option. The random component of systematic sampling is the random selection of a starting value in the systematic interval. If you use the START= option to provide a purposely chosen (nonrandom) starting value, the resulting systematic selection does not provide a random, probability-based sample.

Systematic sampling controls the distribution of the sample by spreading the selections throughout the sampling frame (or stratum) at equal intervals and thus provides implicit stratification. You can specify a CONTROL statement to order the input data set by CONTROL variables before sample selection. If you also specify a STRATA statement, PROC SURVEYSELECT sorts by the CONTROL variables within strata. If you do not specify a CONTROL statement, PROC SURVEYSELECT applies systematic selection to the observations in the order in which they appear in the input data set.