Working with SAS Data Sets


Index a SAS Data Set

Searching through a large data set for observations that satisfy some complex criteria can take a long time. You can reduce this search time by indexing the data set. The INDEX statement builds a special companion file that contains the values and record numbers of the indexed variables. After the index is built, queries that use a WHERE clauses might use the index to make the processing more efficient. Any number of variables can be indexed, but only one index is in use at a given time.

If you sort a data set in place or use the PURGE statement to delete observations, indices for the data set are deleted.

After you index a data set, the SAS/IML language has the option to use the index when a search is conducted with respect to the indexed variables. The indexes are updated automatically whenever you change values in indexed variables. When an index is in use, observations cannot be randomly accessed by their physical location numbers. In other words, you cannot use the POINT clause when an index is in effect.

To see how the INDEX statement works, make a copy of the Sashelp.Class data set, as follows:

data Class;
set Sashelp.Class;
run;

If you want a list of all female students in the Class data set, you can first index the data by the Sex variable:

proc iml;
use Class;
index Sex;

If you subsequently submit a WHERE clause that uses the Sex variable, the index is used. Of course, the Class data set is small, so you will not notice any performance improvement for these data. However, for large data sets indexing can improve performance.

Now list all students by using the following statement:

list all;

Figure 7.23: Indexed Observations

   OBS Name     Sex       Age    Height    Weight                               
------ -------- --- --------- --------- ---------                               
     2 Alice    F     13.0000   56.5000   84.0000                               
     3 Barbara  F     13.0000   65.3000   98.0000                               
     4 Carol    F     14.0000   62.8000  102.5000                               
     7 Jane     F     12.0000   59.8000   84.5000                               
     8 Janet    F     15.0000   62.5000  112.5000                               
    11 Joyce    F     11.0000   51.3000   50.5000                               
    12 Judy     F     14.0000   64.3000   90.0000                               
    13 Louise   F     12.0000   56.3000   77.0000                               
    14 Mary     F     15.0000   66.5000  112.0000                               
     1 Alfred   M     14.0000   69.0000  112.5000                               
     5 Henry    M     14.0000   63.5000  102.5000                               
     6 James    M     12.0000   57.3000   83.0000                               
     9 Jeffrey  M     13.0000   62.5000   84.0000                               
    10 John     M     12.0000   59.0000   99.5000                               
    15 Philip   M     16.0000   72.0000  150.0000                               
    16 Robert   M     12.0000   64.8000  128.0000                               
    17 Ronald   M     15.0000   67.0000  133.0000                               
    18 Thomas   M     11.0000   57.5000   85.0000                               
    19 William  M     15.0000   66.5000  112.0000                               
                                                                                



Notice that the indexed observations are sorted by Sex rather than by the OBS number. Retrievals that use the Sex variable are quicker than retrievals of data that are not indexed.