DECLARE Statement, Hash and Hash Iterator Objects

Declares a hash or hash iterator object; creates an instance of and initializes data for a hash or hash iterator object.

Valid in: DATA step
Category: Action
Type: Executable
Alias: DCL
Applies to: Hash object, Hash iterator object

Syntax

Form 1:

Form 2:

DECLARE object object-reference <(argument_tag-1: value-1, ...argument_tag-n: value-n )>;

Arguments

object

specifies the component object. It can be one of the following values:

hash

specifies a hash object. The hash object provides a mechanism for quick data storage and retrieval. The hash object stores and retrieves data based on lookup keys.

See Using the Hash Object in SAS Language Reference: Concepts

hiter

specifies a hash iterator object. The hash iterator object enables you to retrieve the hash object's data in forward or reverse key order.

See Using the Hash Object in SAS Language Reference: Concepts

object-reference

specifies the object reference name for the hash or hash iterator object.

argument_tag:value

specifies the information that is used to create an instance of the hash object.

There are five valid hash object argument and value tags:

dataset: 'dataset_name <(datasetoption)>'

Specifies the name of a SAS data set to load into the hash object.

The name of the SAS data set can be a literal or character variable. The data set name must be enclosed in single or double quotation marks. Macro variables must be enclosed in double quotation marks.
You can use SAS data set options when declaring a hash object in the DATASET argument tag. Data set options specify actions that apply only to the SAS data set with which they appear. They enable you to perform the following operations:
  • renaming variables
  • selecting a subset of observations based on observation number for processing
  • selecting observations using the WHERE option
  • dropping or keeping variables from a data set loaded into a hash object, or for an output data set that is specified in an OUTPUT method call
  • specifying a password for a data set.
The following syntax is used:
dcl hash h (dataset: 'x (where = (i > 10))');
For a list of SAS data set options, see the SAS Data Set Options: Reference
Note If the data set contains duplicate keys, the default is to keep the first instance in the hash object; subsequent instances are ignored. To store the last instance in the hash object or an error message written to the SAS log if there is a duplicate key, use the DUPLICATE argument tag.

duplicate: 'option'

determines whether to ignore duplicate keys when loading a data set into the hash object. The default is to store the first key and ignore all subsequent duplicates. Option can be one of the following values:

'replace' | 'r'

stores the last duplicate key record.

'error' | 'e'

reports an error to the log if a duplicate key is found.

The following example that uses the REPLACE option stores brown for the key 620 and blue for the key 531. If you use the default, green would be stored for 620 and yellow would be stored for 531.
data table;
  input key data $;
  datalines;
  531 yellow
  620 green
  531 blue
  908 orange
  620 brown
  143 purple
 run;

data _null_;
length key 8 data $ 8;
if (_n_ = 1) then do;
    declare hash myhash(dataset: "table", duplicate: "r");
    rc = myhash.definekey('key');
    rc = myhash.definedata('data');
    myhash.definedone();
 end;
rc = myhash.output(dataset:"otable");
run;

hashexp: n

The hash object's internal table size, where the size of the hash table is 2n.

The value of HASHEXP is used as a power-of-two exponent to create the hash table size. For example, a value of 4 for HASHEXP equates to a hash table size of 24, or 16. The maximum value for HASHEXP is 20.
The hash table size is not equal to the number of items that can be stored. Imagine the hash table as an array of 'buckets.' A hash table size of 16 would have 16 'buckets.' Each bucket can hold an infinite number of items. The efficiency of the hash table lies in the ability of the hashing function to map items to and retrieve items from the buckets.
You should specify the hash table size relative to the amount of data in the hash object in order to maximize the efficiency of the hash object lookup routines. Try different HASHEXP values until you get the best result. For example, if the hash object contains one million items, a hash table size of 16 (HASHEXP = 4) would work, but not very efficiently. A hash table size of 512 or 1024 (HASHEXP = 9 or 10) would result in the best performance.
Default 8, which equates to a hash table size of 28 or 256

keysum:'variable-name'

specifies the name of a variable that tracks the key summary for all keys. A key summary is a count of how many times a key has been referenced on a FIND method call.

Note The key summary is in the output data set.
Adding the Key Summary to the Output Data Set

ordered: 'option'

Specifies whether or how the data is returned in key-value order if you use the hash object with a hash iterator object or if you use the hash object OUTPUT method.

option can be one of the following values:

'ascending' | 'a'

Data is returned in ascending key-value order. Specifying 'ascending' is the same as specifying 'yes'.

'descending' | 'd'

Data is returned in descending key-value order.

'YES' | 'Y'

Data is returned in ascending key-value order. Specifying 'yes' is the same as specifying 'ascending'.

'NO' | 'N'

Data is returned in some undefined order.

Default NO
Tip The argument can also be enclosed in double quotation marks.

multidata: 'option'

specifies whether multiple data items are allowed for each key.

option can be one of the following values:

'YES' | 'Y'

Multiple data items are allowed for each key.

'NO' | 'N'

Only one data item is allowed for each key.

Default NO
Tip The argument value can also be enclosed in double quotation marks.
See Non-Unique Key and Data Pairs in SAS Language Reference: Concepts

suminc: 'variable-name'

maintains a summary count of hash object keys. The SUMINC argument tag is given a DATA step variable, which holds the sum increment. The sum increment is how much to add to the key summary for each reference to the key.

See Maintaining Key Summaries in SAS Language Reference: Concepts
Example A key summary changes using the current value of the DATA step variable.
dcl hash myhash(suminc: 'count');
See Initializing Hash Object Data Using a Constructor in SAS Language Reference: Concepts and Declaring and Instantiating a Hash Iterator Object in SAS Language Reference: Concepts

Details

The Basics

To use a DATA step component object in your SAS program, you must declare and create (instantiate) the object. The DATA step component interface provides a mechanism for accessing predefined component objects from within the DATA step.
For more information about the predefined DATA step component objects, see Using DATA Step Component Objects in SAS Language Reference: Concepts.

Declaring a Hash or Hash Iterator Object (Form 1)

You use the DECLARE statement to declare a hash or hash iterator object.
declare hash h;
The DECLARE statement tells SAS that the object reference H is a hash object.
After you declare the new hash or hash iterator object, use the _NEW_ operator to instantiate the object. For example, in the following line of code, the _NEW_ operator creates the hash object and assigns it to the object reference H:
h = _new_ hash( );

Using the DECLARE Statement to Instantiate a Hash or Hash Iterator Object (Form 2)

As an alternative to the two-step process of using the DECLARE statement and the _NEW_ operator to declare and instantiate a hash or hash iterator object, you can use the DECLARE statement to declare and instantiate the hash or hash iterator object in one step. For example, in the following line of code, the DECLARE statement declares and instantiates a hash object and assigns it to the object reference H:
declare hash h( );
The previous line of code is equivalent to using the following code:
declare hash h;
h = _new_ hash( );
A constructor is a method that you can use to instantiate a hash object and initialize the hash object data. For example, in the following line of code, the DECLARE statement declares and instantiates a hash object and assigns it to the object reference H. In addition, the hash table size is initialized to a value of 16 (24) using the argument tag, HASHEXP.
declare hash h(hashexp: 4);

Using SAS Data Set Options When Loading a Hash Object

SAS data set options can be used when declaring a hash object that uses the DATASET argument tag. Data set options specify actions that apply only to the SAS data set with which they appear. They enable you to perform the following operations:
  • renaming variables
  • selecting a subset of observations based on observation number for processing
  • selecting observations using the WHERE option
  • dropping or keeping variables from a data set loaded into a hash object, or for an output data set that is specified in an OUTPUT method call
  • specifying a password for a data set.
The following syntax is used:
dcl hash h(dataset: 'x (where = (i > 10))');
For more examples of using data set options, see Using SAS Data Set Options When Loading a Hash Object. For a list of data set options, see SAS Data Set Options: Reference.

Comparisons

You can use the DECLARE statement and the _NEW_ operator, or the DECLARE statement alone to declare and instantiate an instance of a hash or hash iterator object.

Examples

Example 1: Declaring and Instantiating a Hash Object By Using the DECLARE Statement and _NEW_ Operator

This example uses the DECLARE statement to declare a hash object. The _NEW_ operator is used to instantiate the hash object.
data _null_;
   length k $15;
   length d $15;
   if _N_ = 1 then do;
      /* Declare and instantiate hash object "myhash" */ 
      declare hash myhash;
      myhash = _new_ hash( );
      /* Define key and data variables */
      rc = myhash.defineKey('k');
      rc = myhash.defineData('d');
      rc = myhash.defineDone( );
      /* avoid uninitialized variable notes */
      call missing(k, d);
   end;
   /* Create constant key and data values */
   rc = myhash.add(key: 'Labrador', data: 'Retriever');
   rc = myhash.add(key: 'Airedale', data: 'Terrier');
   rc = myhash.add(key: 'Standard', data: 'Poodle');
   /* Find data associated with key and write data to log */
   rc = myhash.find(key: 'Airedale');
   if (rc = 0) then
      put d=;
   else
      put 'Key Airedale not found';
run;

Example 2: Declaring and Instantiating a Hash Object By Using the DECLARE Statement

This example uses the DECLARE statement to declare and instantiate a hash object in one step.
data _null_;
   length k $15;
   length d $15;
   if _N_ = 1 then do;
      /* Declare and instantiate hash object "myhash" */ 
      declare hash myhash( );
      rc = myhash.defineKey('k');
      rc = myhash.defineData('d');
      rc = myhash.defineDone( );
      /* avoid uninitialized variable notes */
      call missing(k, d);
   end;
   /* Create constant key and data values */
   rc = myhash.add(key: 'Labrador', data: 'Retriever');
   rc = myhash.add(key: 'Airedale', data: 'Terrier');
   rc = myhash.add(key: 'Standard', data: 'Poodle');
   /* Find data associated with key and write data to log*/
   rc = myhash.find(key: 'Airedale');
   if (rc = 0) then
      put d=;
   else
      put 'Key Airedale not found';
run;

Example 3: Instantiating and Sizing a Hash Object

This example uses the DECLARE statement to declare and instantiate a hash object. The hash table size is set to 16 (24).
data _null_;
   length k $15;
   length d $15;
   if _N_ = 1 then do;
      /* Declare and instantiate hash object "myhash". */
      /* Set hash table size to 16. */ 
      declare hash myhash(hashexp: 4);
      rc = myhash.defineKey('k');
      rc = myhash.defineData('d');
      rc = myhash.defineDone( );
      /* avoid uninitialized variable notes */
      call missing(k, d);
   end;
   /* Create constant key and data values */
   rc = myhash.add(key: 'Labrador', data: 'Retriever');
   rc = myhash.add(key: 'Airedale', data: 'Terrier');
   rc = myhash.add(key: 'Standard', data: 'Poodle');
   rc = myhash.find(key: 'Airedale');
   /* Find data associated with key and write data to log*/
   if (rc = 0) then
      put d=;
   else
      put 'Key Airedale not found';
run;

Example 4: Using SAS Data Set Options When Loading a Hash Object

The following examples use various SAS data set options when declaring a hash object:
data x;
retain j 999; 
do i = 1 to 20;
   output;
 end;
run;
/* Using the WHERE option. */
data _null_;
  length i 8;
  dcl hash h(dataset: 'x (where =(i > 10))', ordered: 'a');
  h.definekey('i');
  h.definedone();
  h.output(dataset: 'out');
  run;
/* Using the DROP option. */
data _null_;
  length i 8;
  dcl hash h(dataset: 'x (drop = j)', ordered: 'a');
  h.definekey(all: 'y');
  h.definedone();
  h.output(dataset: 'out (where =( i < 8))');
  run;
/* Using the FIRSTOBS option. */
data _null_;
  length i j 8;
  dcl hash h(dataset: 'x (firstobs=5)', ordered: 'a');
  h.definekey(all: 'y');
  h.definedone();
  h.output(dataset: 'out');
  run;
/* Using the OBS option. */
data _null_;
  length i j 8;
  dcl hash h(dataset: 'x (obs=5)', ordered: 'd');
  h.definekey(all: 'y');
  h.definedone();
  h.output(dataset: 'out (rename =(j=k))');
  run;
For a list of SAS data set options, see SAS Data Set Options: Reference.

Example 5: Adding the Key Summary to the Output Data Set

The following example declares the variable, ks, to hold the key summary and adds the variable to the output data set.
data key;
   length key data 8;
   input key data;
   datalines;
      1 10
      2 11
      3 20
      5 5
      4 6
run;

data _null_;
   length key data r i sum 8;
   length ks 8;
   i = 0;
   dcl hash h(dataset:'key', suminc: 'i', keysum: 'ks');
   h.definekey('key');
   h.definedata('key', 'data');
   h.definedone();

   i = 1;
   do key = 1 to 5;
      rc = h.find();
   end;

   do key = 1 to 3;
      rc = h.find();
   end;

   rc = h.output(dataset:'out');
 run;

proc print data=out;
run;
Output of Key Summary Data
Output of Key Summary Data

See Also

Using DATA Step Component Objects in SAS Language Reference: Concepts