Working with SAS Data Sets |
Searching through a large data set for information about one or more specific observations can take a long time because the procedure must read each record. You can reduce this search time by first indexing the data set by a variable. The INDEX statement builds a special companion file that contains the values and record numbers of the indexed variables. Once the index is built, IML can use the index for queries with WHERE clauses if it decides that indexed retrieval is more efficient. Any number of variables can be indexed, but only one index is in use at a given time. Note that purging a data set with the PURGE statement results in the loss of all associated indexes.
Once you have indexed a data set, IML can use this index whenever a search is conducted with respect to the indexed variables. The indexes are updated automatically whenever you change values in indexed variables. When an index is in use, observations cannot be randomly accessed by their physical location numbers. This means that the POINT range cannot be used when an index is in effect. However, if you purge the observations marked for deletion, or sort the data set in place, the indexes become invalid and IML automatically deletes them.
For example, if you want a list of all female students in the CLASS data set, you can first index CLASS by the variable SEX. Then use the LIST statement with a WHERE clause. Of course, the CLASS data set is small, and indexing does little if anything to speed queries with the WHERE clause. If the data set had thousands of students, though, indexing could save search time.
To index the data set by the variable SEX, submit the following statement:
> index sex; NOTE: Variable SEX indexed. NOTE: Retrieval by SEX.
Now list all students by using the following statement. Notice the ordering of the special file built by indexing by the variable SEX. Retrievals by SEX will be quick.
> list all; OBS NAME SEX AGE HEIGHT WEIGHT ------ -------- -------- --------- --------- --------- 1 JOYCE F 11.0000 51.3000 50.5000 4 JANE F 12.0000 59.8000 84.5000 6 LOUISE F 12.0000 56.3000 77.0000 8 ALICE F 13.0000 56.5000 84.0000 9 BARBARA F 13.0000 65.3000 98.0000 11 CAROL F 14.0000 62.8000 102.5000 14 JUDY F 14.0000 64.3000 90.0000 15 JANET F 15.0000 62.5000 112.5000 16 MARY F 15.0000 66.5000 112.0000 2 THOMAS M 11.0000 57.5000 85.0000 3 JAMES M 12.0000 57.3000 83.0000 7 ROBERT M 12.0000 64.8000 128.0000 10 JEFFREY M 13.0000 62.5000 84.0000 12 HENRY M 15.0000 63.5000 102.5000 13 ALFRED M 14.0000 69.0000 112.5000 17 RONALD M 15.0000 67.0000 133.0000 18 WILLIAM M 15.0000 66.5000 112.0000 19 PHILIP M 16.0000 72.0000 150.0000
Copyright © SAS Institute, Inc. All Rights Reserved.