PROC HPSAMPLE
<options> ;
The PROC HPSAMPLE statement invokes the procedure.
You can specify the following options:
-
DATA=<libref.>table
-
names the table (SAS data set or database table) that you want to sample from. The default is the most recently opened or
created data set. If the data are already distributed, the procedure reads the data alongside the distributed database. See
the section “Single-machine Mode and Distributed Mode” on page 10 for the various execution modes and the section “Alongside-the-Database Execution” on page 15 for the alongside-the-database model.
-
OUT=<libref.>SAS-data-set
-
names the SAS data set that you want to output the sample to. If you run alongside database, you need to specify a data set
with the same database libref as the input data and make sure it does not already exist in the database. This option is required.
-
SAMPPCT=sample-percentage
-
names sample percentage to be used by PROC HPSAMPLE. The value of sample-percentage should be a positive number less than 100. For example, SAMPPCT=50.5 specifies that you want to sample 50.5 percent of data.
-
SAMPOBS=number
-
names the minimal number of observations you want to sample from the input data. The value of number must be a positive integer. If it exceeds the total number of observations in the input data, the output sample has the same
number of observations as the input data set.
Note: You must specify either the SAMPPCT or the SAMPOBS option. If both are specified, only the SAMPPCT option is honored.
-
SEED=random-seed
-
specifies the seed for the random number generator. If random-seed is not specified or it is specified as a negative number, the seed is set to be the default 12345. The SEED option enables
you to reproduce the same sample output.
-
PARTITION
-
produces an output data set with the same number of rows as the input data set but with an additional partition indicator
(_PARTIND_), which indicates whether an observation is selected to the sample (1) or not (0).
-
NONORM
-
distinguishes target values that share the same normalized value when you do stratified sampling. For example, if a target
has three distinct values, “A”, “B”, and “b”, and you want to treat “B” and “b” as different levels, you need to use NONORM. By default, “B” and “b” are treated as the same level. PROC HPSAMPLE normalizes a value as follows:
-
Leading blanks are removed.
-
The value is truncated to 32 characters.
-
Letters are changed from lowercase to uppercase.
Copyright © SAS Institute Inc. All Rights Reserved.