Hash and Hash Iterator Object Language Elements |
Applies to: | Hash object |
Hash iterator object |
Syntax | |
Arguments | |
Details | |
Comparisons | |
Examples | |
See Also |
Syntax |
object-reference = _NEW_ object(<argument_tag-1: value-1<, ...argument_tag-n: value-n>>); |
specifies the object reference name for the hash or hash iterator object.
specifies the component object. It can be one of the following:
hash |
indicates a hash object. The hash object provides a mechanism for quick data storage and retrieval. The hash object stores and retrieves data based on lookup keys. For more information about the hash object, see Using the Hash Object in SAS Language Reference: Concepts. |
hiter |
indicates a hash iterator object. The hash iterator object enables you to retrieve the hash object's data in forward or reverse key order. For more information about the hash iterator object, see Using the Hash Iterator Object in SAS Language Reference: Concepts. |
specifies the information that is used to create an instance of the hash object.
Valid hash object argument tags are
Names a SAS data set to load into the hash object.
The name of the SAS data set can be a literal or character variable. The data set name must be enclosed in single or double quotation marks. Macro variables must be enclosed in double quotation marks.
You can use SAS data set options when declaring a hash object in the DATASET argument tag. Data set options specify actions that apply only to the SAS data set with which they appear. They enable you to perform the following operations:
renaming variables
selecting a subset of observations based on observation number for processing
selecting observations using the WHERE option
dropping or keeping variables from a data set loaded into a hash object, or for an output data set specified in an OUTPUT method call
specifying a password for a data set.
dcl hash h; h = _new_ hash (dataset: 'x (where = (i > 10))');For a list of SAS data set options, see Data Set Options by Category.
Note: If the data set contains duplicate keys, the default is to keep the first instance in the hash object; subsequent instances will be ignored. To store the last instance in the hash object or have an error message written in the SAS log if there is a duplicate key, use the DUPLICATE argument tag.
determines whether to ignore duplicate keys when loading a data set into the hash object. The default is to store the first key and ignore all subsequent duplicates. Option can be one of the following values:
stores the last duplicate key record.
reports an error to the log if a duplicate key is found.
The following example using the REPLACE option stores brown for the key 620 and blue for the key 531 . If you use the default, green would be stored for 620 and yellow would be stored for 531.
data table; input key data $; datalines; 531 yellow 620 green 531 blue 908 orange 620 brown 143 purple run; data _null_; length key 8 data $ 8; if (_n_ = 1) then do; declare hash myhash; myhash = _new_ hash (dataset: "table", duplicate: "r"); rc = myhash.definekey('key'); rc = myhash.definedata('data'); myhash.definedone(); end; rc = myhash.output(dataset:"otable"); run;
The hash object's internal table size, where the size of the hash table is 2n.
The value of HASHEXP is used as a power-of-two exponent to create the hash table size. For example, a value of 4 for HASHEXP equates to a hash table size of 24, or 16. The maximum value for HASHEXP is 20.
The hash table size is not equal to the number of items that can be stored. Imagine the hash table as an array of 'buckets.' A hash table size of 16 would have 16 'buckets.' Each bucket can hold an infinite number of items. The efficiency of the hash table lies in the ability of the hashing function to map items to and retrieve items from the buckets.
You should set the hash table size relative to the amount of data in the hash object in order to maximize the efficiency of the hash object lookup routines. Try different HASHEXP values until you get the best result. For example, if the hash object contains one million items, a hash table size of 16 (HASHEXP = 4) would work, but not very efficiently. A hash table size of 512 or 1024 (HASHEXP = 9 or 10) would result in the best performance.
Default: | 8, which equates to a hash table size of 28 or 256 |
Specifies whether or how the data is returned in key-value order if you use the hash object with a hash iterator object or if you use the hash object OUTPUT method.
option can be one of the following values:
'ascending' | 'a' |
Data is returned in ascending key-value order. Specifying 'ascending' is the same as specifying 'yes'. |
'descending' | 'd' |
Data is returned in descending key-value order. |
'YES' | 'Y' |
Data is returned in ascending key-value order. Specifying 'yes' is the same as specifying 'ascending'. |
'NO' | 'N' |
Data is returned in some undefined order. |
Default: | NO |
The argument value can also be enclosed in double quotation marks.
specifies whether multiple data items are allowed for each key.
option can be one of the following values:
'YES' | 'Y' |
Multiple data items are allowed for each key. |
'NO' | 'N' |
Only one data item is allowed for each key. |
Default: | NO |
See Also: | Non-Unique Key and Data Pairs in SAS Language Reference: Concepts |
The argument value can also be enclosed in double quotation marks.
maintains a summary count of hash object keys. The SUMINC argument tag is given a DATA step variable, which holds the sum increment, that is, how much to add to the key summary for each reference to the key. The SUMINC value treats a missing value as zero, like the SUM function. For example, a key summary changes using the current value of the DATA step variable.
dcl hash myhash(suminc: 'count');For more information, see Maintaining Key Summaries in SAS Language Reference: Concepts.
See Also: | Initializing Hash Object Data Using a Constructor and Declaring and Instantiating a Hash Iterator Object in SAS Language Reference: Concepts. |
Details |
To use a DATA step component object in your SAS program, you must declare and create (instantiate) the object. The DATA step component interface provides a mechanism for accessing the predefined component objects from within the DATA step.
If you use the _NEW_ operator to instantiate the component object, you must first use the DECLARE statement to declare the component object. For example, in the following lines of code, the DECLARE statement tells SAS that the object reference H is a hash object. The _NEW_ operator creates the hash object and assigns it to the object reference H.
declare hash h(); h = _new_ hash( );
Note: You can use the DECLARE statement to declare and instantiate a hash or hash iterator object in one step.
A constructor is a method that is used to instantiate a component object and to initialize the component object data. For example, in the following lines of code, the _NEW_ operator instantiates a hash object and assigns it to the object reference H. In addition, the data set WORK.KENNEL is loaded into the hash object.
declare hash h(); h = _new_ hash(datset: "work.kennel");
For more information about the predefined DATA step component objects and constructors, see Using DATA Step Component Objects in SAS Language Reference: Concepts.
Comparisons |
You can use the DECLARE statement and the _NEW_ operator, or the DECLARE statement alone to declare and instantiate an instance of a hash or hash iterator object.
Examples |
This example uses the _NEW_ operator to instantiate and initialize data for a hash object and instantiate a hash iterator object.
The hash object is filled with data, and the iterator is used to retrieve the data in key order.
data kennel; input name $1-10 kenno $14-15; datalines; Charlie 15 Tanner 07 Jake 04 Murphy 01 Pepe 09 Jacques 11 Princess Z 12 ; run; data _null_; if _N_ = 1 then do; length kenno $2; length name $10; /* Declare the hash object */ declare hash h(); /* Instantiate and initialize the hash object */ h = _new_ hash(dataset:"work.kennel", ordered: 'yes'); /* Declare the hash iterator object */ declare hiter iter; /* Instantiate the hash iterator object */ iter = _new_ hiter('h'); /* Define key and data variables */ h.defineKey('kenno'); h.defineData('name', 'kenno'); h.defineDone(); /* avoid uninitialized variable notes */ call missing(kenno, name); end; /* Find the first key in the ordered hash object and output to the log */ rc = iter.first(); do while (rc = 0); put kenno ' ' name; rc = iter.next(); end; run;
The following lines are written to the SAS log:
Output of Data Written in Key Order
NOTE: There were 7 observations read from the data set WORK.KENNEL. 01 Murphy 04 Jake 07 Tanner 09 Pepe 11 Jacques 12 Princess Z 15 Charlie
See Also |
Statements: | |||
Using DATA Step Component Objects in SAS Language Reference: Concepts |
Copyright © 2011 by SAS Institute Inc., Cary, NC, USA. All rights reserved.