This example describes hospital selection for a survey by using PROC SURVEYSELECT. A state health agency plans to conduct a statewide survey of a variety of different hospital services. The agency plans to select a probability sample of individual discharge records within hospitals by using a two-stage sample design. First-stage units are hospitals, and second-stage units are patient discharges during the study period. Hospitals are stratified first according to geographic region and then by rural/urban type and size of hospital. Two hospitals are selected from each stratum with probability proportional to size.
The data set HospitalFrame
contains all hospitals in the first geographical region of the state:
data HospitalFrame; input Hospital$ Type$ SizeMeasure @@; if (SizeMeasure < 20) then Size='Small '; else if (SizeMeasure < 50) then Size='Medium'; else Size='Large '; datalines; 034 Rural 0.870 107 Rural 1.316 079 Rural 2.127 223 Rural 3.960 236 Rural 5.279 165 Rural 5.893 086 Rural 0.501 141 Rural 11.528 042 Urban 3.104 124 Urban 4.033 006 Urban 4.249 261 Urban 4.376 195 Urban 5.024 190 Urban 10.373 038 Urban 17.125 083 Urban 40.382 259 Urban 44.942 129 Urban 46.702 133 Urban 46.992 218 Urban 48.231 026 Urban 61.460 058 Urban 65.931 119 Urban 66.352 ;
In the SAS data set HospitalFrame
, the variable Hospital
identifies the hospital. The variable Type
equals 'Urban' if the hospital is located in an urban area, and 'Rural' otherwise. The variable SizeMeasure
contains the hospital’s size measure, which is constructed from past data on service utilization for the hospital together
with the desired sampling rates for each service. This size measure reflects the amount of relevant survey information expected
from the hospital. For information about this type of size measure, see Drummond et al. (1982). The value of the variable Size
is 'Small', 'Medium', or 'Large', depending on the value of the hospital’s size measure.
The following PROC PRINT statements display the data set Hospital Frame
and produce Output 102.2.1:
title1 'Hospital Utilization Survey'; title2 'Sampling Frame, Region 1'; proc print data=HospitalFrame; run;
Output 102.2.1: Sampling Frame
Hospital Utilization Survey |
Sampling Frame, Region 1 |
Obs | Hospital | Type | SizeMeasure | Size |
---|---|---|---|---|
1 | 034 | Rural | 0.870 | Small |
2 | 107 | Rural | 1.316 | Small |
3 | 079 | Rural | 2.127 | Small |
4 | 223 | Rural | 3.960 | Small |
5 | 236 | Rural | 5.279 | Small |
6 | 165 | Rural | 5.893 | Small |
7 | 086 | Rural | 0.501 | Small |
8 | 141 | Rural | 11.528 | Small |
9 | 042 | Urban | 3.104 | Small |
10 | 124 | Urban | 4.033 | Small |
11 | 006 | Urban | 4.249 | Small |
12 | 261 | Urban | 4.376 | Small |
13 | 195 | Urban | 5.024 | Small |
14 | 190 | Urban | 10.373 | Small |
15 | 038 | Urban | 17.125 | Small |
16 | 083 | Urban | 40.382 | Medium |
17 | 259 | Urban | 44.942 | Medium |
18 | 129 | Urban | 46.702 | Medium |
19 | 133 | Urban | 46.992 | Medium |
20 | 218 | Urban | 48.231 | Medium |
21 | 026 | Urban | 61.460 | Large |
22 | 058 | Urban | 65.931 | Large |
23 | 119 | Urban | 66.352 | Large |
The following PROC SURVEYSELECT statements select a probability sample of hospitals from the HospitalFrame
data set by using a stratified design with PPS selection of two units from each stratum:
title1 'Hospital Utilization Survey'; title2 'Stratified PPS Sampling'; proc surveyselect data=HospitalFrame method=pps_brewer seed=48702 out=SampleHospitals; size SizeMeasure; strata Type Size notsorted; run;
The STRATA statement names the stratification variables Type
and Size
. The NOTSORTED option specifies that observations with the same STRATA variable values are grouped together but are not necessarily
sorted in alphabetical or increasing numerical order. In the HospitalFrame
data set, Size
= 'Small' precedes Size
= 'Medium'.
In the PROC SURVEYSELECT statement, the METHOD=PPS_BREWER option requests sample selection by Brewer’s method, which selects
two units per stratum with probability proportional to size. The SEED= option specifies 48702 as the initial seed for random
number generation. The SIZE statement names SizeMeasure
as the size measure variable. It is not necessary to specify the sample size in the N= option, because Brewer’s method always
selects two units from each stratum.
Output 102.2.2 displays the output from PROC SURVEYSELECT. A total of eight hospitals were selected from the four strata. The data set SampleHospitals
contains the selected hospitals.
The following PROC PRINT statements display the sample hospitals and produce Output 102.2.3:
title1 'Hospital Utilization Survey'; title2 'Sample Selected by Stratified PPS Design'; proc print data=SampleHospitals; run;
Output 102.2.3: Sample Hospitals
Hospital Utilization Survey |
Sample Selected by Stratified PPS Design |
Obs | Type | Size | Hospital | SizeMeasure | SelectionProb | SamplingWeight | JtSelectionProb |
---|---|---|---|---|---|---|---|
1 | Rural | Small | 165 | 5.893 | 0.37447 | 2.67046 | 0.22465 |
2 | Rural | Small | 141 | 11.528 | 0.73254 | 1.36511 | 0.22465 |
3 | Urban | Small | 006 | 4.249 | 0.17600 | 5.68181 | 0.01454 |
4 | Urban | Small | 195 | 5.024 | 0.20810 | 4.80533 | 0.01454 |
5 | Urban | Medium | 129 | 46.702 | 0.41102 | 2.43297 | 0.11211 |
6 | Urban | Medium | 218 | 48.231 | 0.42448 | 2.35584 | 0.11211 |
7 | Urban | Large | 058 | 65.931 | 0.68060 | 1.46929 | 0.36555 |
8 | Urban | Large | 119 | 66.352 | 0.68495 | 1.45996 | 0.36555 |
The variable SelectionProb
contains the selection probability for each hospital in the sample. The variable JtSelectionProb
contains the joint probability of selection for the two sample hospitals in the same stratum. The variable SamplingWeight
contains the sampling weight component for this first stage of the design. The final-stage weight components, which correspond
to patient record selection within hospitals, can be multiplied by the hospital weight components to obtain the overall sampling
weights.