![]() | ![]() | ![]() | ![]() |
In order to select a valid probability sample, every unit in the frame or input data set must have a positive selection probability. For PROC SURVEYSELECT, this means zero is not a statistically valid value for the options SAMPSIZE=, N=, SAMPRATE= or RATE=. Also, the _NSIZE_ variable in a SAMPSIZE= or N= data set cannot contain zeros. The same applies to the _RATE_ variable in a SAMPRATE= or RATE= data set. If you include a zero value in any of these, you will receive an error such as:
ERROR: The value SAMPSIZE = 0 is not a positive integer
or
ERROR: The value _NSIZE_ = 0 from the SAMPSIZE input data set is not a positive integer.
Using a SAMPSIZE= or SAMPRATE= value list that contains one or more zero values will cause PROC SURVEYSELECT to stop processing with an error, and the sampling will not be carried out. However, due to internal differences in the way an input SAMPSIZE= or SAMPRATE= data set is handled, size values of zero in these data sets will cause an error in the current stratum only and processing will continue for the remaining strata. As a result, a correct sample will be generated along with an appropriate error message.
The better method to omit strata from a stratified random sample would be to subset the data first, creating a frame data set containing only the strata to be included in the sample, or to subset the data using DATA set options in PROC SURVEYSELECT.
Suppose you have the following database of individuals in Rhode Island including their county of residence (Population values from the U.S. Census Bureau 2008 estimate):
data RI; length County $ 10; County='Bristol'; do id=1 to 49,838; output; end; County='Kent'; do id=1 to 168,058; output; end; County='Newport'; do id=1 to 80,478; output; end; County='Providence'; do id=1 to 626,150; output; end; County='Washington'; do id=1 to 126,264; output; end; run;
Suppose you want to take a stratified random sample of 10% of the individuals from each of the 3 largest counties — Kent, Providence and Washington. If you try to do this with the following code, using sampling rates of zero in an attempt to omit the counties Bristol and Newport, the sample will not be taken:
proc surveyselect data=RI out=Sample method=srs samprate=(0,10,0,10,10); strata County; run;
If you take the same approach with an input data set containing zero rates, you will get errors that are correct and can be ignored, but you will also obtain a valid output sample data set:
data Rate; length County $ 10; input County $ _rate_; datalines; Bristol 0 Kent 10 Newport 0 Providence 10 Washington 10 ; proc surveyselect data=RI out=sample method=srs samprate=Rate; strata County; run;
To avoid errors, one approach is to subset the DATA= set. Either of the following SURVEYSELECT steps will select a correct sample without errors:
data RI_Frame; set RI; where County ^in ('Bristol', 'Newport'); run; proc surveyselect data=RI_Frame out=sample method=srs samprate=10; strata County; run; proc surveyselect data=RI_Frame out=sample method=srs samprate=Rate; strata County; run;
Alternatively, if you do not need to maintain a frame data set, you can subset the data only for the current procedure using a DATA set option. For example:
proc surveyselect data=RI(where=(County ^in ('Bristol', 'Newport'))) out=sample method=srs samprate=10; strata County; run;
or
proc surveyselect data=RI(where=(County ^in ('Bristol', 'Newport'))) out=sample method=srs samprate=Rate; strata County; run;
Note that you only need to subset the DATA= data set. If a SAMPRATE= or SAMPSIZE= data set contains observations for strata that do not exist in the frame, these extra observations are ignored by PROC SURVEYSELECT.
Product Family | Product | System | SAS Release | |
Reported | Fixed* | |||
SAS System | SAS/STAT | z/OS | ||
OpenVMS VAX | ||||
Microsoft® Windows® for 64-Bit Itanium-based Systems | ||||
Microsoft Windows Server 2003 Datacenter 64-bit Edition | ||||
Microsoft Windows Server 2003 Enterprise 64-bit Edition | ||||
Microsoft Windows XP 64-bit Edition | ||||
Microsoft® Windows® for x64 | ||||
OS/2 | ||||
Microsoft Windows 7 | ||||
Microsoft Windows 95/98 | ||||
Microsoft Windows 2000 Advanced Server | ||||
Microsoft Windows 2000 Datacenter Server | ||||
Microsoft Windows 2000 Server | ||||
Microsoft Windows 2000 Professional | ||||
Microsoft Windows NT Workstation | ||||
Microsoft Windows Server 2003 Datacenter Edition | ||||
Microsoft Windows Server 2003 Enterprise Edition | ||||
Microsoft Windows Server 2003 Standard Edition | ||||
Microsoft Windows Server 2008 | ||||
Microsoft Windows XP Professional | ||||
Windows Millennium Edition (Me) | ||||
Windows Vista | ||||
64-bit Enabled AIX | ||||
64-bit Enabled HP-UX | ||||
64-bit Enabled Solaris | ||||
ABI+ for Intel Architecture | ||||
AIX | ||||
HP-UX | ||||
HP-UX IPF | ||||
IRIX | ||||
Linux | ||||
Linux for x64 | ||||
Linux on Itanium | ||||
OpenVMS Alpha | ||||
OpenVMS on HP Integrity | ||||
Solaris | ||||
Solaris for x64 | ||||
Tru64 UNIX |
Type: | Usage Note |
Priority: | |
Topic: | Analytics ==> Survey Sampling and Analysis SAS Reference ==> Procedures ==> SURVEYSELECT |
Date Modified: | 2009-10-13 14:59:28 |
Date Created: | 2009-10-07 16:35:25 |