The HPSAMPLE Procedure

PROC HPSAMPLE statement

PROC HPSAMPLE <options> ;

The PROC HPSAMPLE statement invokes the procedure.

You can specify the following options:

DATA=<libref.>table

names the table (SAS data set or database table) that you want to sample from. The default is the most recently opened or created data set. If the data are already distributed, the procedure reads the data alongside the distributed database. See the section Single-machine Mode and Distributed Mode on page 10 for the various execution modes and the section Alongside-the-Database Execution on page 15 for the alongside-the-database model.

OUT=<libref.>SAS-data-set

names the SAS data set that you want to output the sample to. If you run alongside database, you need to specify a data set with the same database libref as the input data and make sure it does not already exist in the database. This option is required.

SAMPPCT=sample-percentage

names sample percentage to be used by PROC HPSAMPLE. The value of sample-percentage should be a positive number less than 100. For example, SAMPPCT=50.5 specifies that you want to sample 50.5 percent of data.

SAMPOBS=number

names the minimal number of observations you want to sample from the input data. The value of number must be a positive integer. If it exceeds the total number of observations in the input data, the output sample has the same number of observations as the input data set.

Note: You must specify either the SAMPPCT or the SAMPOBS option. If both are specified, only the SAMPPCT option is honored.

SEED=random-seed

specifies the seed for the random number generator. If random-seed is not specified or it is specified as a negative number, the seed is set to be the default 12345. The SEED option enables you to reproduce the same sample output.

PARTITION

produces an output data set with the same number of rows as the input data set but with an additional partition indicator (_PARTIND_), which indicates whether an observation is selected to the sample (1) or not (0).

NONORM

distinguishes target values that share the same normalized value when you do stratified sampling. For example, if a target has three distinct values, A, B, and b, and you want to treat B and b as different levels, you need to use NONORM. By default, B and b are treated as the same level. PROC HPSAMPLE normalizes a value as follows:

  1. Leading blanks are removed.

  2. The value is truncated to 32 characters.

  3. Letters are changed from lowercase to uppercase.