PROC SURVEYSELECT: Sample Selection Methods :: SAS/STAT(R) 9.3 User's Guide

Sample Selection Methods

PROC SURVEYSELECT provides a variety of methods for selecting probability-based random samples. With probability sampling, each unit in the survey population has a known, positive probability of selection. This property of probability sampling avoids selection bias and enables you to use statistical theory to make valid inferences from the sample to the survey population. See Lohr (2010), Kish (1965, 1987), Kalton (1983), and Cochran (1977) for more information about probability sampling.

In equal probability sampling, each unit in the sampling frame, or in a stratum, has the same probability of being selected for the sample. PROC SURVEYSELECT provides the following methods that select units with equal probability: simple random sampling, unrestricted random sampling, systematic random sampling, and sequential random sampling. In simple random sampling, units are selected without replacement, which means that a unit cannot be selected more than once. Both systematic and sequential equal probability sampling are also without replacement. In unrestricted random sampling, units are selected with replacement, which means that a unit can be selected more than once. In with-replacement sampling, the number of hits refers to the number of times a unit is selected.

In probability proportional to size (PPS) sampling, a unit’s selection probability is proportional to its size measure. PROC SURVEYSELECT provides the following methods that select units with probability proportional to size (PPS): PPS sampling without replacement, PPS sampling with replacement, PPS systematic sampling, PPS sequential sampling, Brewer’s method, Murthy’s method, and Sampford’s method. PPS sampling is often used in cluster sampling, where you select clusters (or groups of sampling units) of varying size in the first stage of selection. For example, clusters might be schools, hospitals, or geographical areas, and the final sampling units might be students, patients, or citizens. Cluster sampling can provide efficiencies in frame construction and other survey operations. See Lohr (2010), Kalton (1983), Kish (1965), and the other references cited in the following sections for more information.

All the probability sampling methods provided by PROC SURVEYSELECT use random numbers in their selection algorithms, as described in the following sections and in the references cited. PROC SURVEYSELECT uses a uniform random number function to generate streams of pseudo-random numbers from an initial starting point, or seed. You can use the SEED= option to specify the initial seed. If you do not specify the SEED= option, PROC SURVEYSELECT uses the time of day from the computer’s clock to obtain the initial seed. PROC SURVEYSELECT generates uniform random numbers according to the method of Fishman and Moore (1982), which uses a prime modulus multiplicative generator with modulus $\text{[math]}$ and multiplier $\text{[math]}$ . PROC SURVEYSELECT uses the same uniform random number generator as the RANUNI function. For more information about the RANUNI function, see SAS Language Reference: Dictionary.

The following sections give detailed descriptions of the sample selection methods available in PROC SURVEYSELECT. In these sections, $\text{[math]}$ denotes the sample size (the number of units in the sample) for stratum $\text{[math]}$ , and $\text{[math]}$ denotes the population size (number of units in the population) for stratum $\text{[math]}$ , for $\text{[math]}$ . When the sample design is not stratified, $\text{[math]}$ denotes the sample size, and $\text{[math]}$ denotes the population size. For PPS sampling, $\text{[math]}$ represents the size measure for unit $\text{[math]}$ in stratum $\text{[math]}$ , $\text{[math]}$ is the total of all size measures for the population of stratum $\text{[math]}$ , and $\text{[math]}$ is the relative size of unit $\text{[math]}$ in stratum $\text{[math]}$ .

The SURVEYSELECT Procedure