Optimizing Data Storage |
An index is an optional file that you can create to provide direct access to specific rows. The index stores values in ascending value order for a specific column or columns and includes information about the location of those values within rows in the table. In other words, an index enables you to locate a row by value. For example, if you use SAS to find a specific Social Security number (123-45-6789), SAS performs the search differently depending on whether there is an index on the row that contains the Social Security numbers:
Without an index, SAS accesses rows sequentially in the order in which they are stored in the table. SAS reads each row, looking for SSN=123-45-6789 until the value is found or all observations are read.
With an index on column SSN, SAS accesses the row directly. SAS satisfies the condition by using the index and going straight to the row that contains the value. SAS does not have to read each row.
When you create an index, you designate which columns to index. You can create two types of indexes:
a simple index, which consists of the values of one column
a composite index, which consists of the values of more than one column, with the values concatenated to form a single value
For each indexed column, you can also perform these tasks:
declare unique values. A unique index guarantees that values for one column or the combination of a composite group of columns remain unique for every row in the table. If an update tries to add a duplicate value to that column, then the update is rejected.
keep missing values from using space in the index by specifying that missing values are not maintained by the index.
In addition to writing SAS code to create indexes, you can create indexes on target tables by using SAS Data Integration Studio. In SAS Data Integration Studio, you use the properties window for the table to index individual columns. When you create the index, you can also specify Unique values and No missing values. Note that any indexes registered in metadata for a target table are physically created when the job is run. Simply editing the properties for an existing table and adding indexes does not update the physical table. The following figure shows the SAS Data Integration Studio properties dialog box for a table:
The Indexes Tab in the Properties Dialog Box for a Table Named STORE_ID
In general, SAS can use an index to improve performance in these situations:
For cube loading, a composite index on the columns that make up the cube's hierarchies might provide best results.
For WHERE processing, an index can provide faster and more efficient access to a subset of data. Note that to process a WHERE expression, SAS decides whether to use an index or to read the table sequentially.
Note: For WHERE processing, the Base SAS engine uses a maximum of one index. The SPD Engine can use multiple indexes.
Even though an index can reduce the time that is required to locate a set of rows, especially for a large table, there are costs that are associated with creating, storing, and maintaining the index. When deciding whether to create an index, you must consider increased resource usage, along with the performance improvement.
Once an index exists, SAS treats it as part of the table. That is, if you add or delete columns or modify values, the index is automatically updated.
For more information about creating indexes, see SAS Language Reference: Concepts.
Copyright © 2009 by SAS Institute Inc., Cary, NC, USA. All rights reserved.