PPS Systematic Sampling

If you specify the METHOD=PPS_SYS option, PROC SURVEYSELECT selects the sample by using systematic random sampling with probability proportional to size. Systematic sampling selects units at a fixed interval throughout the sampling frame (or stratum) after a random start. If you request stratified sampling by specifying a STRATA statement, PROC SURVEYSELECT independently selects systematic samples from the strata. PROC SURVEYSELECT applies systematic selection to sampling units in the order of their appearance in the input data set, or in their sorted order if you specify a CONTROL statement.

When you specify the sample size in the SAMPSIZE= option, PROC SURVEYSELECT computes the systematic selection interval as the ratio of the total size to the sample size ($M/n$, or $M_{h \cdot }/ n_ h$ for stratified sampling). The procedure uses a fractional systematic interval to provide the specified sample size exactly. Depending on the sample size and the values of the size measures, it might be possible for a sampling unit to be selected more than once. The expected number of hits (selections) for unit i in stratum h is computed as $n_ h M_{hi}/M_{h \cdot } = n_ h Z_{hi}$ . For more information, see Cochran (1977, pp. 265–266) and Madow (1949).

Instead of specifying the sample size for systematic sampling, you can directly specify the systematic interval in the INTERVAL= option. When you specify the interval, PROC SURVEYSELECT computes the expected number of hits as the inverse of the interval value.

By default, PROC SURVEYSELECT randomly determines a starting value in the selection interval. Optionally, you can specify the starting value in the START= option. The random component of systematic sampling is the random selection of a starting value in the systematic interval. If you use the START= option to provide a purposely chosen (nonrandom) starting value, the resulting systematic selection does not provide a random, probability-based sample.

Systematic sampling controls the distribution of the sample by spreading the selections throughout the sampling frame (or stratum) at equal intervals and thus provides implicit stratification. You can specify a CONTROL statement to order the input data set by the CONTROL variables before sample selection. If you also specify a STRATA statement, PROC SURVEYSELECT sorts by the CONTROL variables within strata. If you do not specify a CONTROL statement, PROC SURVEYSELECT applies systematic selection to the observations in the order in which they appear in the input data set.