# The SURVEYSELECT Procedure

### Example 115.2 PPS Selection of Two Units per Stratum

This example describes hospital selection for a survey by using PROC SURVEYSELECT. A state health agency plans to conduct a statewide survey of a variety of different hospital services. The agency plans to select a probability sample of individual discharge records within hospitals by using a two-stage sample design. First-stage units are hospitals, and second-stage units are patient discharges during the study period. Hospitals are stratified first according to geographic region and then by rural/urban type and size of hospital. Two hospitals are selected from each stratum with probability proportional to size.

The data set `HospitalFrame` contains all hospitals in the first geographical region of the state:

```data HospitalFrame;
input Hospital\$ Type\$ SizeMeasure @@;
if (SizeMeasure < 20) then Size='Small ';
else if (SizeMeasure < 50) then Size='Medium';
else Size='Large ';
datalines;
034 Rural  0.870   107 Rural  1.316
079 Rural  2.127   223 Rural  3.960
236 Rural  5.279   165 Rural  5.893
086 Rural  0.501   141 Rural 11.528
042 Urban  3.104   124 Urban  4.033
006 Urban  4.249   261 Urban  4.376
195 Urban  5.024   190 Urban 10.373
038 Urban 17.125   083 Urban 40.382
259 Urban 44.942   129 Urban 46.702
133 Urban 46.992   218 Urban 48.231
026 Urban 61.460   058 Urban 65.931
119 Urban 66.352
;
```

In the SAS data set `HospitalFrame`, the variable `Hospital` identifies the hospital. The variable `Type` equals 'Urban' if the hospital is located in an urban area, and 'Rural' otherwise. The variable `SizeMeasure` contains the hospital’s size measure, which is constructed from past data on service utilization for the hospital together with the target sampling rates for each service. This size measure reflects the amount of relevant survey information expected from the hospital. For information about this type of size measure, see Drummond et al. (1982). The value of the variable `Size` is 'Small', 'Medium', or 'Large', depending on the value of the hospital’s size measure.

The following PROC PRINT statements display the data set `Hospital Frame` and produce Output 115.2.1:

```title1 'Hospital Utilization Survey';
title2 'Sampling Frame, Region 1';
proc print data=HospitalFrame;
run;
```

Output 115.2.1: Sampling Frame

 Hospital Utilization Survey Sampling Frame, Region 1

Obs Hospital Type SizeMeasure Size
1 034 Rural 0.870 Small
2 107 Rural 1.316 Small
3 079 Rural 2.127 Small
4 223 Rural 3.960 Small
5 236 Rural 5.279 Small
6 165 Rural 5.893 Small
7 086 Rural 0.501 Small
8 141 Rural 11.528 Small
9 042 Urban 3.104 Small
10 124 Urban 4.033 Small
11 006 Urban 4.249 Small
12 261 Urban 4.376 Small
13 195 Urban 5.024 Small
14 190 Urban 10.373 Small
15 038 Urban 17.125 Small
16 083 Urban 40.382 Medium
17 259 Urban 44.942 Medium
18 129 Urban 46.702 Medium
19 133 Urban 46.992 Medium
20 218 Urban 48.231 Medium
21 026 Urban 61.460 Large
22 058 Urban 65.931 Large
23 119 Urban 66.352 Large

The following PROC SURVEYSELECT statements select a probability sample of hospitals from the `HospitalFrame` data set by using a stratified design with PPS selection of two units from each stratum:

```title1 'Hospital Utilization Survey';
title2 'Stratified PPS Sampling';
proc surveyselect data=HospitalFrame method=pps_brewer
seed=48702 out=SampleHospitals;
size SizeMeasure;
strata Type Size notsorted;
run;
```

The STRATA statement names the stratification variables `Type` and `Size`. The NOTSORTED option specifies that observations with the same STRATA variable values are grouped together but are not necessarily sorted in alphabetical or increasing numerical order. In the `HospitalFrame` data set, `Size` = 'Small' precedes `Size` = 'Medium'.

In the PROC SURVEYSELECT statement, the METHOD=PPS_BREWER option requests sample selection by Brewer’s method, which selects two units per stratum with probability proportional to size. The SEED= option specifies 48702 as the initial seed for random number generation. The SIZE statement names `SizeMeasure` as the size measure variable. It is not necessary to specify the sample size in the N= option because Brewer’s method selects two units from each stratum.

Output 115.2.2 displays the output from PROC SURVEYSELECT. A total of 8 hospitals are selected from the 4 strata. The data set `SampleHospitals` contains the selected hospitals.

Output 115.2.2: Sample Selection Summary

 Hospital Utilization Survey Stratified PPS Sampling

The SURVEYSELECT Procedure

Selection Method Brewer's PPS Method SizeMeasure Type Size

Input Data Set HOSPITALFRAME 48702 2 4 8 SAMPLEHOSPITALS

The following PROC PRINT statements display the sample hospitals and produce Output 115.2.3:

```title1 'Hospital Utilization Survey';
title2 'Sample Selected by Stratified PPS Design';
proc print data=SampleHospitals;
run;
```

Output 115.2.3: Sample Hospitals

 Hospital Utilization Survey Sample Selected by Stratified PPS Design

Obs Type Size Hospital SizeMeasure SelectionProb SamplingWeight JtSelectionProb
1 Rural Small 165 5.893 0.37447 2.67046 0.22465
2 Rural Small 141 11.528 0.73254 1.36511 0.22465
3 Urban Small 190 10.373 0.42967 2.32739 0.25370
4 Urban Small 038 17.125 0.70934 1.40975 0.25370
5 Urban Medium 083 40.382 0.35540 2.81374 0.08953
6 Urban Medium 133 46.992 0.41357 2.41795 0.08953
7 Urban Large 026 61.460 0.63445 1.57617 0.31940
8 Urban Large 119 66.352 0.68495 1.45996 0.31940

The variable `SelectionProb` contains the selection probability for each hospital in the sample. The variable `JtSelectionProb` contains the joint probability of selection for the two sample hospitals in the same stratum. The variable `SamplingWeight` contains the sampling weight component for this first stage of the design. The final-stage weight components, which correspond to patient record selection within hospitals, can be multiplied by the hospital weight components to obtain the overall sampling weights.