Previous Page | Next Page

The SURVEYSELECT Procedure

PROC SURVEYSELECT Statement
PROC SURVEYSELECT options ;

The PROC SURVEYSELECT statement invokes the procedure and optionally identifies input and output data sets. If you do not name a DATA= input data set, the procedure selects the sample from the most recently created SAS data set. If you do not name an OUT= output data set to contain the sample of selected units, the procedure still creates an output data set and names it according to the DATAn convention.

The PROC SURVEYSELECT statement also specifies the sample selection method, the sample size, and other sample design parameters. If you do not specify a selection method, PROC SURVEYSELECT uses simple random sampling (METHOD=SRS) if there is no SIZE statement. If you do specify a SIZE statement and do not specify a selection method, PROC SURVEYSELECT uses probability proportional to size selection without replacement (METHOD=PPS). You must specify the sample size or sampling rate unless you request a method that selects two units from each stratum (METHOD=PPS_BREWER or METHOD=PPS_MURTHY).

You can use the SAMPSIZE=n option to specify the sample size, or you can use the SAMPSIZE=SAS-data-set option to name a secondary input data set that contains stratum sample sizes. You can also specify stratum sampling rates, minimum size measures, maximum size measures, and certainty size measures in the secondary input data set. See the descriptions of the SAMPSIZE=, SAMPRATE=, MINSIZE=, MAXSIZE=, CERTSIZE=, and CERTSIZE=P= options for more information. You can name only one secondary input data set in each invocation of the procedure. See the section Secondary Input Data Set for details.

Table 87.1 lists the options available with the PROC SURVEYSELECT statement. Descriptions follow in alphabetical order.

Table 87.1 PROC SURVEYSELECT Statement Options

Task

Options

Specify the input data set

DATA=

Specify output data sets

OUT=

 

OUTSORT=

Suppress displayed output

NOPRINT

Specify selection method

METHOD=

Specify sample size

SAMPSIZE=

 

SELECTALL

Specify sampling rate

SAMPRATE=

 

NMIN=

 

NMAX=

Specify number of replicates

REPS=

Adjust size measures

MINSIZE=

 

MAXSIZE=

Specify certainty size measures

CERTSIZE=

 

CERTSIZE=P=

Specify sorting type

SORT=

Specify random number seed

SEED=

Control OUT= contents

JTPROBS

 

OUTALL

 

OUTHITS

 

OUTSEED

 

OUTSIZE

 

STATS

You can specify the following options in the PROC SURVEYSELECT statement.

CERTSIZE

requests certainty selection, where the certainty size values are provided in the secondary input data set. Use the CERTSIZE option when you have already named the secondary data set in another option, such as the SAMPSIZE=SAS-data-set option. See the section Secondary Input Data Set for details.

In certainty selection, PROC SURVEYSELECT automatically selects all sampling units with size measures greater than or equal to the stratum certainty size values. After identifying the certainty units, PROC SURVEYSELECT selects the remainder of the sample according to the method specified in the METHOD= option. The CERTSIZE option is available for METHOD=PPS and METHOD=PPS_SAMPFORD.

You provide the stratum certainty size values in the secondary input data set variable _CERTSIZE_. Each certainty size value must be a positive number. The variable Certain in the OUT= data set identifies the certainty selections, which have selection probabilities equal to 1.

If you want to specify a single certainty size value for all strata, you can use the CERTSIZE=certain option.

CERTSIZE=certain

specifies the certainty size value. PROC SURVEYSELECT automatically selects all sampling units with size measures greater than or equal to the value certain. After identifying the certainty units, PROC SURVEYSELECT selects the remainder of the sample according to the method specified in the METHOD= option. The CERTSIZE= option is available for METHOD=PPS and METHOD=PPS_SAMPFORD.

The value of certain must be a positive number. The variable Certain in the OUT= data set identifies the certainty selections, which have selection probabilities equal to 1.

If you request a stratified sample design with the STRATA statement and specify the CERTSIZE=certain option, PROC SURVEYSELECT uses the value certain for all strata. If you do not want to use the same certainty size for all strata, use the CERTSIZE=SAS-data-set option to specify a certainty size value for each stratum.

CERTSIZE=SAS-data-set

names a SAS data set that contains certainty size values for the strata. PROC SURVEYSELECT automatically selects all sampling units with size measures greater than or equal to the stratum certainty size values. After identifying the certainty units, PROC SURVEYSELECT selects the remainder of the sample according to the method specified in the METHOD= option. The CERTSIZE= option is available for METHOD=PPS and METHOD=PPS_SAMPFORD.

You provide the stratum certainty size values in the CERTSIZE= data set variable _CERTSIZE_. Each certainty size value must be a positive number. The variable Certain in the OUT= data set identifies the certainty selections, which have selection probabilities equal to 1.

The CERTSIZE= input data set should contain all the STRATA variables, with the same type and length as in the DATA= data set. The STRATA groups should appear in the same order in the CERTSIZE= data set as in the DATA= data set. The CERTSIZE= data set must include a variable named _CERTSIZE_ that contains the certainty size value for each stratum. The CERTSIZE= data set is a secondary input data set. See the section Secondary Input Data Set for details. You can name only one secondary input data set in each invocation of the procedure.

If you want to specify a single certainty size value for all strata, you can use the CERTSIZE=certain option.

CERTSIZE=P

requests certainty proportion selection, where the stratum certainty proportions are provided in the secondary input data set. Use the CERTSIZE=P option when you have already named the secondary data set in another option, such as the SAMPSIZE=SAS-data-set option. See the section Secondary Input Data Set for details.

In certainty proportion selection, PROC SURVEYSELECT automatically selects all sampling units with size measures greater than or equal to the stratum certainty proportion of the total stratum size. The procedure repeats this process with the remaining units until no more certainty units are selected. After identifying the certainty units, PROC SURVEYSELECT selects the remainder of the sample according to the method specified in the METHOD= option. The CERTSIZE=P option is available for METHOD=PPS and METHOD=PPS_SAMPFORD.

You provide the stratum certainty proportions in the secondary input data set variable _CERTP_. Each certainty proportion must be a positive number. You can specify a proportion value as a number between 0 and 1. Or you can specify a proportion value in percentage form as a number between 1 and 100, and PROC SURVEYSELECT converts that number to a proportion. The procedure treats the value 1 as 100%, and not the percentage form 1%.

The variable Certain in the OUT= data set identifies the certainty selections, which have selection probabilities equal to 1.

If you want to specify a single certainty proportion for all strata, you can use the CERTSIZE=P=p option.

CERTSIZE=P=p

specifies the certainty proportion. PROC SURVEYSELECT automatically selects all sampling units with size measures greater than or equal to the proportion p of the total stratum size. The procedure repeats this process with the remaining units until no more certainty units are selected. After identifying the certainty units, PROC SURVEYSELECT selects the remainder of the sample according to the method specified in the METHOD= option. The CERTSIZE=P= option is available for METHOD=PPS and METHOD=PPS_SAMPFORD.

The value of p must be a positive number. You can specify p as a number between 0 and 1. Or you can specify p in percentage form as a number between 1 and 100, and PROC SURVEYSELECT converts that number to a proportion. The procedure treats the value 1 as 100%, and not the percentage form 1%.

The variable Certain in the OUT= data set identifies the certainty selections, which have selection probabilities equal to 1.

If you request a stratified sample design with the STRATA statement and specify the CERTSIZE=P=p option, PROC SURVEYSELECT uses the certainty proportion p for all strata. If you do not want to use the same certainty proportion for all strata, use the CERTSIZE=P=SAS-data-set option to specify a certainty proportion for each stratum.

CERTSIZE=P=SAS-data-set

names a SAS data set that contains the certainty proportions for the strata. PROC SURVEYSELECT automatically selects all sampling units with size measures greater than or equal to the certainty proportion of the total stratum size. The procedure repeats this process with the remaining units until no more certainty units are selected. After identifying the certainty units, PROC SURVEYSELECT selects the remainder of the sample according to the method specified in the METHOD= option. The CERTSIZE=P= option is available for METHOD=PPS and METHOD=PPS_SAMPFORD.

You provide the stratum certainty proportions in the CERTSIZE=P= data set variable _CERTP_. Each certainty proportion must be a positive number. You can specify a proportion value as a number between 0 and 1. Or you can specify a proportion value in percentage form as a number between 1 and 100, and PROC SURVEYSELECT converts that number to a proportion. The procedure treats the value 1 as 100%, and not the percentage form 1%.

The variable Certain in the OUT= data set identifies the certainty selections, which have selection probabilities equal to 1.

The CERTSIZE=P= input data set should contain all the STRATA variables, with the same type and length as in the DATA= data set. The STRATA groups should appear in the same order in the CERTSIZE=P= data set as in the DATA= data set. The CERTSIZE=P= data set must include a variable named _CERTP_ that contains the certainty proportion for each stratum. The CERTSIZE=P= data set is a secondary input data set. See the section Secondary Input Data Set for details. You can name only one secondary input data set in each invocation of the procedure.

If you want to specify a single certainty proportion for all strata, you can use the CERTSIZE=P=p option.

DATA=SAS-data-set

names the SAS data set from which PROC SURVEYSELECT selects the sample. If you omit the DATA= option, the procedure uses the most recently created SAS data set. In sampling terminology, the input data set is the sampling frame, or list of units from which the sample is selected.

JTPROBS

includes joint probabilities of selection in the OUT= output data set. This option is available for the following probability proportional to size selection methods: METHOD=PPS, METHOD=PPS_SAMPFORD, and METHOD=PPS_WR. By default, PROC SURVEYSELECT outputs joint selection probabilities for METHOD=PPS_BREWER and METHOD=PPS_MURTHY, which select two units per stratum.

For details about computation of joint selection probabilities for a particular sampling method, see the method description in the section Sample Selection Methods. For more information about the contents of the output data set, see the section Sample Output Data Set.

MAXSIZE

requests adjustment of size measures according to the stratum maximum size values provided in the secondary input data set. Use the MAXSIZE option when you have already named the secondary input data set in another option, such as the SAMPSIZE=SAS-data-set option. See the section Secondary Input Data Set for details.

You provide the stratum maximum size values in the secondary input data set variable _MAXSIZE_. Each maximum size value must be a positive number.

When a size measure exceeds the specified maximum value for its stratum, PROC SURVEYSELECT adjusts the size measure downward to equal the maximum size value. The variable AdjustedSize in the OUT= data set contains the adjusted size measures.

The MAXSIZE option is available when you use a SIZE statement for probability proportional to size selection and a STRATA statement.

If you want to specify a single maximum size value for all strata, you can use the MAXSIZE=max option.

MAXSIZE=max

specifies the maximum size value. The value of max must be a positive number.

When any size measure exceeds the value max, PROC SURVEYSELECT adjusts the size measure downward to equal max. The variable AdjustedSize in the OUT= data set contains the adjusted size measures.

The MAXSIZE=max option is available when you use a SIZE statement for selection with probability proportional to size.

If you request a stratified sample design with the STRATA statement and specify the MAXSIZE=max option, PROC SURVEYSELECT uses the maximum size max for all strata. If you do not want to use the same maximum size for all strata, use the MAXSIZE=SAS-data-set option to specify a maximum size value for each stratum.

MAXSIZE=SAS-data-set

names a SAS data set that contains the maximum size values for the strata. You provide the stratum maximum size values in the MAXSIZE= data set variable _MAXSIZE_. Each maximum size value must be a positive number.

When any size measure exceeds the maximum size value for its stratum, PROC SURVEYSELECT adjusts the size measure downward to equal the maximum size value. The variable AdjustedSize in the OUT= data set contains the adjusted size measures.

The MAXSIZE=SAS-data-set option is available when you use a SIZE statement for probability proportional to size selection and a STRATA statement for stratified selection.

The MAXSIZE= input data set should contain all the STRATA variables, with the same type and length as in the DATA= data set. The STRATA groups should appear in the same order in the MAXSIZE= data set as in the DATA= data set. The MAXSIZE= data set must include a variable named _MAXSIZE_ that contains the maximum size value for each stratum. The MAXSIZE= data set is a secondary input data set. See the section Secondary Input Data Set for details. You can name only one secondary input data set in each invocation of the procedure.

If you want to specify a single maximum size value for all strata, you can use the MAXSIZE=max option.


METHOD=name
M=name

specifies the method for sample selection. If you do not specify the METHOD= option, by default, PROC SURVEYSELECT uses simple random sampling (METHOD=SRS) if there is no SIZE statement. If you specify a SIZE statement, the default selection method is probability proportional to size without replacement (METHOD=PPS).

Valid values for name are as follows:

PPS

requests selection with probability proportional to size and without replacement. See the section PPS Sampling without Replacement for details. If you specify METHOD=PPS, you must name the size measure variable in the SIZE statement.

PPS_BREWER
BREWER

requests selection according to Brewer’s method. Brewer’s method selects two units from each stratum with probability proportional to size and without replacement. See the section Brewer’s PPS Method for details. If you specify METHOD=PPS_BREWER, you must name the size measure variable in the SIZE statement. You do not need to specify the sample size with the SAMPSIZE= option, because Brewer’s method selects two units from each stratum.

PPS_MURTHY
MURTHY

requests selection according to Murthy’s method. Murthy’s method selects two units from each stratum with probability proportional to size and without replacement. See the section Murthy’s PPS Method for details. If you specify METHOD=PPS_MURTHY, you must name the size measure variable in the SIZE statement. You do not need to specify the sample size with the SAMPSIZE= option, because Murthy’s method selects two units from each stratum.

PPS_SAMPFORD
SAMPFORD

requests selection according to Sampford’s method. Sampford’s method selects units with probability proportional to size and without replacement. See the section Sampford’s PPS Method for details. If you specify METHOD=PPS_SAMPFORD, you must name the size measure variable in the SIZE statement.

PPS_SEQ
CHROMY

requests sequential selection with probability proportional to size and with minimum replacement. This method is also known as Chromy’s method. See the section PPS Sequential Sampling for details. If you specify METHOD=PPS_SEQ, you must name the size measure variable in the SIZE statement.

PPS_SYS

requests systematic selection with probability proportional to size. See the section PPS Systematic Sampling for details. If you specify METHOD=PPS_SYS, you must name the size measure variable in the SIZE statement.

PPS_WR

requests selection with probability proportional to size and with replacement. See the section PPS Sampling with Replacement for details. If you specify METHOD=PPS_WR, you must name the size measure variable in the SIZE statement.

SEQ

requests sequential selection according to Chromy’s method. If you specify METHOD=SEQ and do not specify a size measure variable with the SIZE statement, PROC SURVEYSELECT uses sequential zoned selection with equal probability and without replacement. See the section Sequential Random Sampling for details. If you specify METHOD=SEQ and also name a size measure variable in the SIZE statement, PROC SURVEYSELECT uses METHOD=PPS_SEQ, which is sequential selection with probability proportional to size and with minimum replacement. See the section PPS Sequential Sampling for more information.

SRS

requests simple random sampling, which is selection with equal probability and without replacement. See the section Simple Random Sampling for details. METHOD=SRS is the default if you do not specify the METHOD= option and also do not specify a SIZE statement.

SYS

requests systematic random sampling. If you specify METHOD=SYS and do not specify a size measure variable with the SIZE statement, PROC SURVEYSELECT uses systematic selection with equal probability. See the section Systematic Random Sampling for more information. If you specify METHOD=SYS and also name a size measure variable in the SIZE statement, PROC SURVEYSELECT uses METHOD=PPS_SYS, which is systematic selection with probability proportional to size. See the section PPS Systematic Sampling for details.

URS

requests unrestricted random sampling, which is selection with equal probability and with replacement. See the section Unrestricted Random Sampling for details.

MINSIZE

requests adjustment of size measures according to the stratum minimum size values provided in the secondary input data set. Use the MINSIZE option when you have already named the secondary input data set in another option, such as the SAMPSIZE=SAS-data-set option. See the section Secondary Input Data Set for details.

You provide the stratum minimum size values in the secondary input data set variable _MINSIZE_. Each minimum size value must be a positive number.

When a size measure is less than the specified minimum value for its stratum, PROC SURVEYSELECT adjusts the size measure upward to equal the minimum size value. The variable AdjustedSize in the OUT= data set contains the adjusted size measures.

The MINSIZE option is available when you specify a SIZE statement for probability proportional to size selection and a STRATA statement.

If you want to specify a single minimum size value for all strata, you can use the MINSIZE=min option.

MINSIZE=min

specifies the minimum size value. The value of min must be a positive number.

When any size measure is less than the value min, PROC SURVEYSELECT adjusts the size measure upward to equal min. The variable AdjustedSize in the OUT= data set contains the adjusted size measures.

The MINSIZE=min option is available when you specify a SIZE statement for selection with probability proportional to size.

If you request a stratified sample design with the STRATA statement and specify the MINSIZE=min option, PROC SURVEYSELECT uses the minimum size min for all strata. If you do not want to use the same minimum size for all strata, use the MINSIZE=SAS-data-set option to specify a minimum size value for each stratum.

MINSIZE=SAS-data-set

names a SAS data set that contains the minimum size values for the strata. You provide the stratum minimum size values in the MINSIZE= data set variable _MINSIZE_. Each minimum size value must be a positive number.

When any size measure is less than the minimum size value for its stratum, PROC SURVEYSELECT adjusts the size measure upward to equal the minimum size measure. The variable AdjustedSize in the OUT= data set contains the adjusted size measures.

The MINSIZE=SAS-data-set option is available when you specify a SIZE statement for probability proportional to size selection and a STRATA statement.

The MINSIZE= input data set should contain all the STRATA variables, with the same type and length as in the DATA= data set. The STRATA groups should appear in the same order in the MINSIZE= data set as in the DATA= data set. The MINSIZE= data set must include a variable named _MINSIZE_ that contains the minimum size measure for each stratum. The MINSIZE= data set is a secondary input data set. See the section Secondary Input Data Set for details. You can name only one secondary input data set in each invocation of the procedure.

If you want to specify a single minimum size measure for all strata, you can use the MINSIZE=min option.


NMAX=n

specifies the maximum stratum sample size n for the SAMPRATE= option. When you specify the SAMPRATE= option, PROC SURVEYSELECT calculates the stratum sample size by multiplying the total number of units in the stratum by the specified sampling rate. If this sample size is greater than the value NMAX=n, then PROC SURVEYSELECT selects only n units.

The maximum sample size n must be a positive integer. The NMAX= option is available only with the SAMPRATE= option, which can be used with equal probability selection methods (METHOD=SRS, METHOD=URS, METHOD=SYS, and METHOD=SEQ).

NMIN=n

specifies the minimum stratum sample size n for the SAMPRATE= option. When you specify the SAMPRATE= option, PROC SURVEYSELECT calculates the stratum sample size by multiplying the total number of units in the stratum by the specified sampling rate. If this sample size is less than the value NMIN=n, then PROC SURVEYSELECT selects n units.

The minimum sample size n must be a positive integer. The NMIN= option is available only with the SAMPRATE= option, which can be used with equal probability selection methods (METHOD=SRS, METHOD=URS, METHOD=SYS, and METHOD=SEQ).

NOPRINT

suppresses the display of all output. You can use the NOPRINT option when you want only to create an output data set. Note that this option temporarily disables the Output Delivery System (ODS). For more information, see Chapter 20, Using the Output Delivery System.

OUT=SAS-data-set

names the output data set that contains the sample. If you omit the OUT= option, the data set is named DATAn, where n is the smallest integer that makes the name unique.

The output data set contains the units selected for the sample, as well as design information and selection statistics, depending on the selection method and output options you specify. See descriptions of the options JTPROBS, OUTHITS, OUTSEED, OUTSIZE, and STATS, which specify information to include in the output data set. See the section Sample Output Data Set for details about the contents of the output data set.

By default, the output data set contains only those units selected for the sample. To include all observations from the input data set in the output data set, use the OUTALL option.

By default, the output data set includes one observation for each unit selected. When the unit is selected multiple times, which can occur when you use with-replacement or with-minimum-replacement selection methods, the OUT= data set variable NumberHits contains the number of hits or selections for each unit. To produce a separate observation for each hit or selection, specify the OUTHITS option.

If you specify the NOSAMPLE option in the STRATA statement, PROC SURVEYFREQ allocates the total sample size among the strata but does not select the sample. In this case, the OUT= data set contains the allocated sample sizes. See the section Allocation Output Data Set for details.


OUTALL

includes all observations from the input data set in the output data set. By default, the output data set includes only those observations selected for the sample. When you specify the OUTALL option, the output data set includes all observations from DATA= and also contains a variable to indicate each observation’s selection status. The variable Selected equals 1 for an observation selected for the sample, and equals 0 for an observation not selected. For information about the contents of the output data set, see the section Sample Output Data Set.

The OUTALL option is available only for equal probability selection methods (METHOD=SRS, METHOD=URS, METHOD=SYS, and METHOD=SEQ).

OUTHITS

includes a separate observation in the output data set for each selection when the same unit is selected more than once. A unit can be selected more than once only by methods that select with replacement or with minimum replacement, which include METHOD=URS, METHOD=PPS_WR, METHOD=PPS_SYS, and METHOD=PPS_SEQ.

By default, the output data set contains one observation for each selected unit, even if it is selected more than once, and the variable NumberHits contains the number of hits or selections for that unit. See the section Sample Output Data Set for details about the contents of the output data set.

The OUTHITS option is available for selection methods that select with replacement or with minimum replacement (METHOD=URS, METHOD=PPS_WR, METHOD=PPS_SYS, and METHOD=PPS_SEQ).

OUTSEED

includes the initial seed for each stratum in the output data set. The variable InitialSeed contains the stratum initial seeds. See the section Sample Output Data Set for details about the contents of the output data set.

To reproduce the same sample for any stratum in a subsequent execution of PROC SURVEYSELECT, you can specify the same stratum initial seed with the SEED=SAS-data-set option, along with the same sample selection parameters. See the section Sample Selection Methods for information about initial seeds and random number generation in PROC SURVEYSELECT.

The "Sample Selection Summary" table displays the initial random number seed for the entire sample selection, which is the same as the initial seed for the first stratum when the design is stratified. To reproduce the entire sample, you can specify this same seed value in the SEED= option, along with the same sample selection parameters.

OUTSIZE

includes additional design and sampling frame parameters in the output data set. If you specify the OUTSIZE option, PROC SURVEYSELECT includes the sample size or sampling rate in the output data set. When you specify the OUTSIZE option and also specify the SIZE statement, the procedure outputs the size measure total for the sampling frame. If you do not specify the SIZE statement, the procedure outputs the total number of sampling units in the frame. Also, PROC SURVEYSELECT includes the minimum size measure if you specify the MINSIZE= option, the maximum size measure if you specify the MAXSIZE= option, and the certainty size measure if you specify the CERTSIZE= option.

If you have a stratified design, the output data set includes the stratum-level values of these parameters. Otherwise, the output data set includes the overall population-level values.

For information about the contents of the output data set, see the section Sample Output Data Set.

OUTSORT=SAS-data-set

names an output data set that contains the sorted input data set. This option is available when you specify a CONTROL statement for systematic or sequential selection methods (METHOD=SYS, METHOD=PPS_SYS, METHOD=SEQ, and METHOD=PPS_SEQ). PROC SURVEYSELECT sorts the input data set by the CONTROL variables within strata before selecting the sample.

If you specify CONTROL variables but do not name an output data set with the OUTSORT= option, then the sorted data set replaces the input data set.

REPS=nreps

specifies the number of sample replicates. The value of nreps must be a positive integer.

When you specify the REPS= option, PROC SURVEYSELECT selects nreps independent samples, each with the same specified sample size or sampling rate and the same sample design. The variable Replicate in the OUT= data set contains the sample replicate number.

You can use replicated sampling to provide a simple method of variance estimation for any form of statistic, as well as to evaluate variable nonsampling errors such as interviewer differences. See Lohr (1999), Wolter (1985), Kish (1965, 1987), and Kalton (1983) for information about replicated sampling.

SAMPRATE=r
RATE=r

specifies the sampling rate, which is the proportion of units to select for the sample. The sampling rate r must be a positive number. You can specify r as a number between 0 and 1. Or you can specify r in percentage form as a number between 1 and 100, and PROC SURVEYSELECT converts that number to a proportion. The procedure treats the value 1 as 100%, and not the percentage form 1%.

The SAMPRATE= option is available only for equal probability selection methods (METHOD=SRS, METHOD=URS, METHOD=SYS, and METHOD=SEQ). For systematic random sampling (METHOD=SYS), PROC SURVEYSELECT uses the inverse of the sampling rate r as the interval. See the section Systematic Random Sampling for details. For other selection methods, PROC SURVEYSELECT converts the sampling rate r to the sample size before selection by multiplying the total number of units in the stratum or frame by the sampling rate and rounding up to the nearest integer.

If you request a stratified sample design with the STRATA statement and specify the SAMPRATE=r option, PROC SURVEYSELECT uses the sampling rate r for each stratum. If you do not want to use the same sampling rate for each stratum, use the SAMPRATE=(values) option or the SAMPRATE=SAS-data-set option to specify a sampling rate for each stratum.

SAMPRATE=(values)
RATE=(values)

specifies stratum sampling rates. You can separate values with blanks or commas. The number of SAMPRATE= values must equal the number of strata in the input data set.

List the stratum sampling rate values in the order in which the strata appear in the input data set. When you use the SAMPRATE=(values) option, the input data set must be sorted by the STRATA variables in ascending order. You cannot use the DESCENDING or NOTSORTED option in the STRATA statement.

Each stratum sampling rate value must be a positive number. You can specify a rate value as a number between 0 and 1. Or you can specify a rate value in percentage form as a number between 1 and 100, and PROC SURVEYSELECT converts that number to a proportion. The procedure treats the value 1 as 100%, and not the percentage form 1%.

The SAMPRATE= option is available only for equal probability selection methods (METHOD=SRS, METHOD=URS, METHOD=SYS, and METHOD=SEQ). For systematic random sampling (METHOD=SYS), PROC SURVEYSELECT uses the inverse of the stratum sampling rate as the interval for the stratum. See the section Systematic Random Sampling for details about systematic sampling. For other selection methods, PROC SURVEYSELECT converts the stratum sampling rate to a stratum sample size before selection by multiplying the total number of units in the stratum by the sampling rate and rounding up to the nearest integer.

SAMPRATE=SAS-data-set
RATE=SAS-data-set

names a SAS data set that contains stratum sampling rates. The SAMPRATE= data set should have a variable _RATE_ that contains the sampling rate for each stratum.

Each sampling rate value must be a positive number. You can specify each value as a number between 0 and 1. Or you can specify a value in percentage form as a number between 1 and 100, and PROC SURVEYSELECT converts that number to a proportion. The procedure treats the value 1 as 100%, and not the percentage form 1%.

The SAMPRATE= input data set should contain all the STRATA variables, with the same type and length as in the DATA= data set. The STRATA groups should appear in the same order in the SAMPRATE= data set as in the DATA= data set.

The SAMPRATE= option is available only for equal probability selection methods (METHOD=SRS, METHOD=URS, METHOD=SYS, and METHOD=SEQ). For systematic random sampling (METHOD=SYS), PROC SURVEYSELECT uses the inverse of the stratum sampling rate as the interval for the stratum. See the section Systematic Random Sampling for details. For other selection methods, PROC SURVEYSELECT converts the stratum sampling rate to the stratum sample size before selection by multiplying the total number of units in the stratum by the sampling rate and rounding up to the nearest integer.


SAMPSIZE=n
N=n

specifies the sample size, which is the number of units to select for the sample. The sample size n must be a positive integer. For selection methods that select without replacement, the sample size n must not exceed the number of units in the input data set.

If you specify the ALLOC= option in the STRATA statement, PROC SURVEYSELECT allocates the total sample size among the strata according to the allocation method you request in the ALLOC= option. In this case, SAMPSIZE=n specifies the total sample size to be allocated among the strata.

Otherwise, if you specify the SAMPSIZE=n option and request a stratified sample design with the STRATA statement, PROC SURVEYSELECT selects n units from each stratum. For methods that select without replacement, the sample size n must not exceed the number of units in any stratum. If you do not want to select the same number of units from each stratum, use the SAMPSIZE=(values) option or the SAMPSIZE=SAS-data-set option to specify a sample size for each stratum.

For without-replacement selection methods, by default, PROC SURVEYSELECT does not allow you to specify a stratum sample size that is greater than the total number of units available in the stratum. If you specify the SELECTALL option, PROC SURVEYSELECT selects all stratum units when the stratum sample size exceeds the number of units in the stratum.

SAMPSIZE=(values)
N=(values)

specifies sample sizes for the strata. You can separate values with blanks or commas. The number of SAMPSIZE= values must equal the number of strata in the input data set.

List the stratum sample size values in the order in which the strata appear in the input data set. When you use the SAMPSIZE=(values) option, the input data set must be sorted by the STRATA variables in ascending order. You cannot use the DESCENDING or NOTSORTED option in the STRATA statement.

Each stratum sample size value must be a positive integer. For without-replacement selection methods, by default, PROC SURVEYSELECT does not allow you to specify a stratum sample size that is greater than the total number of units available in the stratum. If you specify the SELECTALL option, PROC SURVEYSELECT selects all stratum units when the stratum sample size exceeds the number of units in the stratum.

SAMPSIZE=SAS-data-set
N=SAS-data-set

names a SAS data set that contains the sample sizes for the strata.

You provide the stratum sample sizes in the SAMPSIZE= input data set variable named _NSIZE_ or SampleSize. Each stratum sample size value must be a positive integer.

The SAMPSIZE= input data set should contain all the STRATA variables, with the same type and length as in the DATA= data set. The STRATA groups should appear in the same order in the SAMPSIZE= data set as in the DATA= data set. The SAMPSIZE= data set is a secondary input data set. See the section Secondary Input Data Set for details. You can name only one secondary input data set in each invocation of the procedure.

For without-replacement selection methods, by default, PROC SURVEYSELECT does not allow you to specify a stratum sample size that is greater than the total number of units available in the stratum. If you specify the SELECTALL option, PROC SURVEYSELECT selects all stratum units when the stratum sample size exceeds the number of units in the stratum.

SEED=number

specifies the initial seed for random number generation. The SEED= value must be a positive integer. If you do not specify the SEED= option, or if the SEED= value is negative or zero, PROC SURVEYSELECT uses the time of day from the computer’s clock to obtain the initial seed. See the section Sample Selection Methods for more information.

Whether or not you specify the SEED= option, PROC SURVEYSELECT displays the value of the initial seed in the "Sample Selection Summary" table. If you need to reproduce the same sample in a subsequent execution of PROC SURVEYSELECT, you can specify this same seed value in the SEED= option, along with the same sample selection parameters, and PROC SURVEYSELECT will reproduce the sample.

If you request a stratified sample design with the STRATA statement, you can use the SEED=SAS-data-set option to specify an initial seed for each stratum. Otherwise, PROC SURVEYSELECT generates random numbers continuously across strata from the random number stream initialized by the SEED= value, as described in the section Sample Selection Methods.

You can use the OUTSEED option to include the stratum initial seeds in the output data set.

SEED=SAS-data-set

names a SAS data set that contains initial seeds for the strata. You provide the stratum seeds in the SEED= input data set variable _SEED_ or InitialSeed.

The initial seed values must be positive integers. If the initial seed value for the first stratum is not a positive integer, PROC SURVEYSELECT uses the time of day from the computer’s clock to obtain the initial seed. If the initial seed value for a subsequent stratum is not a positive integer, PROC SURVEYSELECT continues to use the random number stream already initialized by the seed for the previous stratum. See the section Sample Selection Methods for more information.

The SEED= input data set should contain all the STRATA variables, with the same type and length as in the DATA= data set. The STRATA groups should appear in the same order in the SEED= data set as in the DATA= data set. The SEED= data set is a secondary input data set. See the section Secondary Input Data Set for details. You can name only one secondary input data set in each invocation of the procedure.

You can use the OUTSEED option to include the stratum initial seeds in the output data set.

Whether or not you specify the SEED= option, PROC SURVEYSELECT displays the value of the initial seed in the "Sample Selection Summary" table. If you need to reproduce the same sample in a subsequent execution of PROC SURVEYSELECT, you can specify this same seed value in the SEED= option, along with the same sample selection parameters, and PROC SURVEYSELECT will reproduce the sample.

If you specify initial seeds by strata with the SEED=SAS-data-set option, you can reproduce the same sample in a subsequent execution of PROC SURVEYSELECT by specifying these same stratum initial seeds, along with the same sample selection parameters. If you need to reproduce the same sample for only a subset of the strata, you can use the same initial seeds for those strata in the subset.

SELECTALL

requests that PROC SURVEYSELECT select all stratum units when the stratum sample size exceeds the total number of units in the stratum. By default, PROC SURVEYSELECT does not allow you to specify a stratum sample size that is greater than the total number of units in the stratum, unless you are using a with-replacement selection method.

The SELECTALL option is available for the following without-replacement selection methods: METHOD=SRS, METHOD=SYS, METHOD=SEQ, METHOD=PPS, and METHOD=PPS_SAMPFORD.

The SELECTALL option is not available for with-replacement selection methods, with-minimum-replacement methods, or those PPS methods that select two units per stratum.

SORT=NEST | SERP

specifies the type of sorting by CONTROL variables. The option SORT=NEST requests nested sorting, and SORT=SERP requests hierarchic serpentine sorting. The default is SORT=SERP. See the section Sorting by CONTROL Variables for descriptions of serpentine and nested sorting. Where there is only one CONTROL variable, the two types of sorting are equivalent.

This option is available when you specify a CONTROL statement for systematic or sequential selection methods (METHOD=SYS, METHOD=PPS_SYS, METHOD=SEQ, and METHOD=PPS_SEQ). When you specify a CONTROL statement, PROC SURVEYSELECT sorts the input data set by the CONTROL variables within strata before selecting the sample.

When you specify a CONTROL statement, you can also use the OUTSORT= option to name an output data set that contains the sorted input data set. Otherwise, if you do not specify the OUTSORT= option, then the sorted data set replaces the input data set.

STATS

includes selection probabilities and sampling weights in the OUT= output data set for equal probability selection methods when you do not specify a STRATA statement. This option is available for the following equal probability selection methods: METHOD=SRS, METHOD=URS, METHOD=SYS, and METHOD=SEQ. For PPS selection methods and stratified designs, the output data set contains selection probabilities and sampling weights by default. For more information about the contents of the output data set, see the section Sample Output Data Set.

Previous Page | Next Page | Top of Page