Working with SAS Data Sets

Indexing a SAS Data Set

Searching through a large data set for information about one or more specific observations can take a long time because the procedure must read each record. You can reduce this search time by first indexing the data set by a variable. The INDEX statement builds a special companion file containing the values and record numbers of the indexed variables. Once the index is built, IML can use the index for queries with WHERE clauses if it decides that indexed retrieval is more efficient. Any number of variables can be indexed, but only one index is in use at a given time. Note that purging a data set with the PURGE statement results in the loss of all associated indexes.

Once you have indexed a data set, IML can use this index whenever a search is conducted with respect to the indexed variables. The indexes are updated automatically whenever you change values in indexed variables. When an index is in use, observations cannot be randomly accessed by their physical location numbers. This means that the POINT range cannot be used when an index is in effect. However, if you purge the observations marked for deletion, or sort the data set in place, the indexes become invalid and IML automatically deletes them.

For example, if you want a list of all female students in the CLASS data set, you can first index CLASS by the variable SEX. Then use the LIST statement with a WHERE clause. Of course, the CLASS data set is small, and indexing does little if anything to speed queries with the WHERE clause. If the data set had thousands of students, though, indexing could save search time.

To index the data set by the variable SEX, submit the following statement:

  
    > index sex; 
  
      NOTE: Variable SEX indexed. 
      NOTE: Retrieval by SEX.
 
Now list all students by using the following statement. Notice the ordering of the special file built by indexing by the variable SEX. Retrievals by SEX will be quick.

  
    > list all; 
  
           OBS NAME     SEX            AGE    HEIGHT    WEIGHT 
        ------ -------- -------- --------- --------- --------- 
             1 JOYCE    F          11.0000   51.3000   50.5000 
             4 JANE     F          12.0000   59.8000   84.5000 
             6 LOUISE   F          12.0000   56.3000   77.0000 
             8 ALICE    F          13.0000   56.5000   84.0000 
             9 BARBARA  F          13.0000   65.3000   98.0000 
            11 CAROL    F          14.0000   62.8000  102.5000 
            14 JUDY     F          14.0000   64.3000   90.0000 
            15 JANET    F          15.0000   62.5000  112.5000 
            16 MARY     F          15.0000   66.5000  112.0000 
             2 THOMAS   M          11.0000   57.5000   85.0000 
             3 JAMES    M          12.0000   57.3000   83.0000 
             7 ROBERT   M          12.0000   64.8000  128.0000 
            10 JEFFREY  M          13.0000   62.5000   84.0000 
            12 HENRY    M          15.0000   63.5000  102.5000 
            13 ALFRED   M          14.0000   69.0000  112.5000 
            17 RONALD   M          15.0000   67.0000  133.0000 
            18 WILLIAM  M          15.0000   66.5000  112.0000 
            19 PHILIP   M          16.0000   72.0000  150.0000
 

Previous Page | Next Page | Top of Page