Usage Note 23759: Cause of error "For METHOD=PPS, the relative size of each sampling
unit must not exceed (1/SAMPSIZE)"
Using METHOD=PPS, sampling with selection probabilities proportional to size and without replacement, it might not be possible to carry out the selection if the sample size is too big. According to the Hanurav-Vijayan algorithm that is implemented in PROC SURVEYSELECT, the selection probability for the ith sampling unit is
- pi = n*(Mi /Mtotal)
where
- n = sample size
- Mi = size measure of the i_th unit
- Mtotal = sum of size measures over all units in the sampling frame
The selection probability for without-replacement methods must be less than or equal to 1 for each unit. If pi=1 for a particular unit, that unit is automatically included in the sample, so the rest of the discussion pertains to units with pi < 1. This discussion also looks at a single sample from one overall frame. If this were a stratified sample, the same computations would be carried out within each stratum.
Given pi < 1, it follows from above that n*(Mi /Mtotal) < 1. Rearrange the terms and (Mi /Mtotal) < 1/n, or the relative size of each sampling unit must not be larger than 1 divided by the sample size. The error message is a direct result of the algorithm. Another way of stating it is that n must be less than 1/Relative Size. If n is too large given the relative sizes, the sample cannot be taken.
This example illustrates the computation of relative sizes and maximum sample size attainable under the Hanurav-Vijayan algorithm:
* Generate some example data. M_i is the size for each unit;
data one;
input Unit_ID M_i @@;
datalines;
1 237.18 2 567.89 3 118.50 4 74.38 5 1287.23 6 258.10
7 325.36 8 218.38 9 1670.80 10 134.71 11 2020.70 12 47.80
13 1183.45 14 330.54 15 780.10 16 895.80 17 620.10 18 420.18
19 979.66 20 810.25 21 670.85 22 314.58 23 87.50 24 1893.40
25 753.30 26 540.65 27 2580.35 28 230.56 29 185.60 30 688.43
31 505.14 32 205.48 33 650.42 34 1348.34 35 30.50 36 2214.80
37 940.35 38 217.85 39 142.90 40 806.90 41 560.72
;
* Compute M_total, the sum of all size measures;
proc means data=one sum;
var M_i;
output out=out sum=M_total;
run;
* Compute M_i/M_total, the relative size for each sampling unit;
data one;
set one;
if _n_=1 then set out;
RelativeSize=M_i/M_total;
MaxSampleSize=int(1/RelativeSize);
run;
proc print data=one;
run;
* Relative Size < 1/n ==> n < 1/Relative Size;
* Given that the sample size, n, must be less than the relative size,
this step computes the smallest value of 1/Relative Size, which is
the maximum sample size attainable;
proc means data=one min;
var MaxSampleSize;
run;
* MaxSampleSize=11 given the size measures in the data set, so there is no
problem running this;
proc surveyselect data=one method=pps n=11 seed=47279 out=sample;
size M_i;
run;
* But if you specify a sample size > 11, you get
ERROR: For METHOD=PPS, the relative size of each sampling unit must
not exceed (1/SAMPSIZE). ;
proc surveyselect data=one method=pps n=12 seed=47279 out=sample;
size M_i;
run;
If the sample size restrictions are a problem, consider selection with minimum replacement (PPS sequential), or if you need without-replacement sampling, you could adjust the size measures. This can be done either manually or by using PROC SURVEYSELECT options such as CERTSIZE= or MAXSIZE= .
Operating System and Release Information
*
For software releases that are not yet generally available, the Fixed
Release is the software release in which the problem is planned to be
fixed.
Type: | Usage Note |
Priority: | low |
Topic: | Analytics ==> Survey Sampling and Analysis SAS Reference ==> Procedures ==> SURVEYSELECT
|
Date Modified: | 2007-11-05 13:44:51 |
Date Created: | 2004-02-18 16:34:11 |