_NEW_ Operator, Hash and Hash Iterator Objects

Creates an instance of a hash or hash iterator object.

Applies to: Hash object, Hash iterator object

Syntax

object-reference = _NEW_object (<argument_tag-1: value-1 <, ...argument_tag-n: value-n> >);

Arguments

object-reference

specifies the object reference name for the hash or hash iterator object.

object

specifies the component object. It can be one of the following:

hash indicates a hash object. The hash object provides a mechanism for quick data storage and retrieval. The hash object stores and retrieves data based on lookup keys.
hiter indicates a hash iterator object. The hash iterator object enables you to retrieve the hash object's data in forward or reverse key order.
See Using DATA Step Component Objects in SAS Language Reference: Concepts and Using the Hash Iterator Object in SAS Language Reference: Concepts

argument_tag:value

specifies the information that is used to create an instance of the hash object.

Valid hash object argument tags and values are

dataset: 'dataset_name <(datasetoption)>'

names a SAS data set to load into the hash object.

The name of the SAS data set can be a literal or character variable. The data set name must be enclosed in single or double quotation marks. Macro variables must be enclosed in double quotation marks.
You can use SAS data set options when declaring a hash object in the DATASET argument tag. Data set options specify actions that apply only to the SAS data set with which they appear. They enable you to perform the following operations:
  • renaming variables
  • selecting a subset of observations based on observation number for processing
  • selecting observations using the WHERE option
  • dropping or keeping variables from a data set loaded into a hash object, or for an output data set specified in an OUTPUT method call
  • specifying a password for a data set.
The following syntax is used:
dcl hash h;
h = _new_ hash (dataset: 'x (where = (i > 10))');
For a list of SAS data set options, see the SAS Data Set Options: Reference.
Note If the data set contains duplicate keys, the default is to keep the first instance in the hash object; subsequent instances are ignored. To store the last instance in the hash object or to write an error message in the SAS log if there is a duplicate key, use the DUPLICATE argument tag.

duplicate: 'option'

determines whether to ignore duplicate keys when loading a data set into the hash object. The default is to store the first key and ignore all subsequent duplicates. Option can be one of the following values:

'replace' | 'r'

stores the last duplicate key record.

'error' | 'e'

reports an error to the log if a duplicate key is found.

The following example using the REPLACE option stores brown for the key 620 and blue for the key 531. If you use the default, green would be stored for 620 and yellow would be stored for 531.
data table;
  input key data $;
  datalines;
  531 yellow
  620 green
  531 blue
  908 orange
  620 brown
  143 purple
 run;
data _null_;
length key 8 data $ 8;
if (_n_ = 1) then do;
    declare hash myhash;
    myhash = _new_ hash (dataset: "table", duplicate: "r");
    rc = myhash.definekey('key');
    rc = myhash.definedata('data');
    myhash.definedone();
 end;
rc = myhash.output(dataset:"otable");
run;

hashexp: n

is the hash object's internal table size, where the size of the hash table is 2n.

The value of HASHEXP is used as a power-of-two exponent to create the hash table size. For example, a value of 4 for HASHEXP equates to a hash table size of 24, or 16. The maximum value for HASHEXP is 20.
The hash table size is not equal to the number of items that can be stored. Imagine the hash table as an array of 'buckets.' A hash table size of 16 would have 16 'buckets.' Each bucket can hold an infinite number of items. The efficiency of the hash table lies in the ability of the hashing function to map items to and retrieve items from the buckets.
You should set the hash table size relative to the amount of data in the hash object in order to maximize the efficiency of the hash object lookup routines. Try different HASHEXP values until you get the best result. For example, if the hash object contains one million items, a hash table size of 16 (HASHEXP = 4) would work, but not very efficiently. A hash table size of 512 or 1024 (HASHEXP = 9 or 10) would result in the best performance.
Default 8, which equates to a hash table size of 28 or 256

keysum:'variable-name'

specifies the name of a variable that tracks the key summary for all keys. A key summary is a count of how many times a key has been referenced on a FIND method call.

Note The key summary is in the output data set.

ordered: 'option'

specifies whether or how the data is returned in key-value order if you use the hash object with a hash iterator object or if you use the hash object OUTPUT method.

The argument value can also be enclosed in double quotation marks.
option can be one of the following values:
'ascending' | 'a' Data is returned in ascending key-value order. Specifying 'ascending' is the same as specifying 'yes'.
'descending' | 'd' Data is returned in descending key-value order.
'YES' | 'Y' Data is returned in ascending key-value order. Specifying 'yes' is the same as specifying 'ascending'.
'NO' | 'N' Data is returned in some undefined order.
Default NO

multidata: 'option'

specifies whether multiple data items are allowed for each key.

The argument value can also be enclosed in double quotation marks.
option can be one of the following values:
'YES' | 'Y' Multiple data items are allowed for each key.
'NO' | 'N' Only one data item is allowed for each key.
Default NO
See Non-Unique Key and Data Pairs in SAS Language Reference: Concepts

suminc: 'variable-name'

maintains a summary count of hash object keys. The SUMINC argument tag is given a DATA step variable, which holds the sum increment. The sum increment is how much to add to the key summary for each reference to the key. For example, a key summary changes using the current value of the DATA step variable.

dcl hash myhash(suminc: 'count');
For more information, see Maintaining Key Summaries in SAS Language Reference: Concepts.
See Initializing Hash Object Data Using a Constructor in SAS Language Reference: Concepts and Declaring and Instantiating a Hash Object in SAS Language Reference: Concepts

Details

To use a DATA step component object in your SAS program, you must declare and create (instantiate) the object. The DATA step component interface provides a mechanism for accessing the predefined component objects from within the DATA step.
If you use the _NEW_ operator to instantiate the component object, you must first use the DECLARE statement to declare the component object. For example, in the following lines of code, the DECLARE statement tells SAS that the object reference H is a hash object. The _NEW_ operator creates the hash object and assigns it to the object reference H.
declare hash h();
h = _new_ hash( );
Note: You can use the DECLARE statement to declare and instantiate a hash or hash iterator object in one step.
A constructor is a method that is used to instantiate a component object and to initialize the component object data. For example, in the following lines of code, the _NEW_ operator instantiates a hash object and assigns it to the object reference H. In addition, the data set WORK.KENNEL is loaded into the hash object.
declare hash h();
h = _new_ hash(datset: "work.kennel");
For more information about the predefined DATA step component objects and constructors, see Using DATA Step Component Objects in SAS Language Reference: Concepts.

Comparisons

You can use the DECLARE statement and the _NEW_ operator, or the DECLARE statement alone to declare and instantiate an instance of a hash or hash iterator object.

Example: Using the _NEW_ Operator to Instantiate and Initialize Hash Object Data

This example uses the _NEW_ operator to instantiate and initialize data for a hash object and instantiate a hash iterator object.
The hash object is filled with data, and the iterator is used to retrieve the data in key order.
data kennel;
   input name $1-10 kenno $14-15;
   datalines;
Charlie      15
Tanner       07
Jake         04
Murphy       01
Pepe         09
Jacques      11
Princess Z   12
;
run;
data _null_;
   if _N_ = 1 then do;
      length kenno $2;
      length name $10;
      /* Declare the hash object */
      declare hash h();
      /* Instantiate and initialize the hash object */
      h = _new_ hash(dataset:"work.kennel", ordered: 'yes');
      /* Declare the hash iterator object */
      declare hiter iter;
      /* Instantiate the hash iterator object */
      iter = _new_ hiter('h');
      /* Define key and data variables */
      h.defineKey('kenno');
      h.defineData('name', 'kenno');
      h.defineDone();
      /* avoid uninitialized variable notes */
      call missing(kenno, name);
   end;
   /* Find the first key in the ordered hash object and output to the log */
   rc = iter.first();
   do while (rc = 0);
      put kenno '   ' name;
      rc = iter.next();
   end;
run;
The following lines are written to the SAS log:
NOTE: There were 7 observations read from the data set WORK.KENNEL.
01    Murphy
04    Jake
07    Tanner
09    Pepe
11    Jacques
12    Princess Z
15    Charlie

See Also

Using DATA Step Component Objects in SAS Language Reference: Concepts