The following PROC SURVEYSELECT statements select a probability sample of customers from the Customers
data set by using simple random sampling:
title1 'Customer Satisfaction Survey'; title2 'Simple Random Sampling'; proc surveyselect data=Customers method=srs n=100 out=SampleSRS; run;
The PROC SURVEYSELECT statement invokes the procedure. The DATA= option names the SAS data set Customers
as the input data set from which to select the sample. The METHOD=SRS option specifies simple random sampling as the sample
selection method. In simple random sampling, each unit has an equal probability of selection, and sampling is without replacement.
Without-replacement sampling means that a unit cannot be selected more than once. The N= option specifies a sample size of
100 customers. The OUT= option stores the sample in the SAS data set named SampleSRS
.
Figure 115.2 displays the output from PROC SURVEYSELECT, which summarizes the sample selection. A sample of 100 customers is selected
from the data set Customers
by simple random sampling. With simple random sampling and no stratification in the sample design, the selection probability
is the same for all units in the sample. In this sample, the selection probability for each customer is 0.007423, which is
the sample size (100) divided by the population size (13,471). The sampling weight is 134.71 for each customer in the sample,
where the weight is the inverse of the selection probability.
If you specify the STATS option, PROC SURVEYSELECT includes the selection probabilities and sampling weights in the output
data set. (This information is always included in the output data set for more complex designs.)
The random number seed is 39647. PROC SURVEYSELECT uses this number as the initial seed for random number generation. Because the SEED= option is not specified in the PROC SURVEYSELECT statement, the seed value is obtained by using the time of day from the computer’s clock. You can specify SEED=39647 to reproduce this sample.
Figure 115.2: Sample Selection Summary
The sample of 100 customers is stored in the SAS data set SampleSRS
. PROC SURVEYSELECT does not display this output data set. The following PROC PRINT statements display the first 20 observations
of SampleSRS
:
title1 'Customer Satisfaction Survey'; title2 'Sample of 100 Customers, Selected by SRS'; title3 '(First 20 Observations)'; proc print data=SampleSRS(obs=20); run;
Figure 115.3 displays the first 20 observations of the output data set SampleSRS
, which contains the sample of customers. This data set includes all the variables from the DATA= input data set Customers
. If you do not want to include all variables, you can use the ID statement to specify which variables to copy from the input
data set to the output (sample) data set.
Figure 115.3: Customer Sample (First 20 Observations)
Customer Satisfaction Survey |
Sample of 100 Customers, Selected by SRS |
(First 20 Observations) |
Obs | CustomerID | State | Type | Usage |
---|---|---|---|---|
1 | 017-27-4096 | GA | New | 168 |
2 | 026-37-3895 | AL | New | 59 |
3 | 038-54-9276 | GA | New | 785 |
4 | 046-40-3131 | FL | New | 60 |
5 | 070-37-6924 | GA | New | 524 |
6 | 100-58-3342 | FL | New | 302 |
7 | 107-61-9029 | AL | New | 235 |
8 | 110-95-0432 | FL | New | 12 |
9 | 112-81-9251 | SC | New | 347 |
10 | 137-33-0478 | GA | New | 551 |
11 | 143-83-4677 | AL | New | 203 |
12 | 147-19-9164 | GA | New | 172 |
13 | 159-51-0606 | FL | New | 102 |
14 | 164-14-7799 | GA | Old | 388 |
15 | 165-05-7323 | SC | New | 606 |
16 | 174-69-3566 | AL | Old | 111 |
17 | 177-69-6934 | FL | New | 202 |
18 | 181-58-3508 | AL | Old | 261 |
19 | 207-41-8446 | AL | Old | 183 |
20 | 207-64-7308 | GA | New | 193 |