Understanding the Observation Count in a SAS Data File
Extending the Observation Count in a SAS Data File
A SAS data set stores the number of observations it has as an integer in the C programming language signed long-data type. This means that the maximum number of observations that can be counted for a SAS data set is limited by the long integer size for the operating environment.
In some 64-bit environments, long integer data types use the 32-bit data model (to maintain compatibility with 32-bit applications). This means that a 64-bit environment could reach the maximum number of approximately two billion observations. OpenVMS for Integrity Servers, 64-bit Windows Itanium, and 64-bit Windows x64 editions use the 32-bit long data model.
For SAS 8 and later, when a SAS data set exceeds the maximum number of observations for the operating environment, continued SAS processing depends on whether the SAS data set has an index or an integrity constraint that uses an index.
For SAS 9, a SAS data set is never damaged when an operation attempts to exceed the maximum number of operations. However, you must take explicit action to continue processing the SAS data set.
Observation Count
When the observation count is no longer maintained, the observation number is represented by a missing value. The following functionality is affected:
Note: The data set MYFILES.BIGFILE has . observations and 56 variables.
File Compression
When a request to compress a SAS data set for which the observation count is no longer maintained is submitted, the compression percentage cannot be calculated. When the compressed data set is created, SAS does not write a note to the SAS log that indicates the percentage of reduction.
Indexes and Integrity Constraints
If the SAS data set exceeds the maximum number of observations, you cannot create an index or an integrity constraint, even the integrity constraints that do not require an index.
WHERE processing continues to execute. However, optimizing the WHERE expression by using an index is not available.
CEDA Processing
For CEDA processing between operating environments with a 32-bit long integer and a 64-bit long integer, the maximum number of observations is that of the operating environment with the 32-bit long integer. When the maximum is exceeded, CEDA processing stops. In addition, if CEDA tries to open a file that already exceeds the maximum, the open fails.
The following situations produce an error for CEDA processing:
APPEND Procedure
For SAS 8, the APPEND procedure behavior depends on the append method. If the index is updated for each added observation, when the maximum is exceeded, PROC APPEND stops adding observations and issues an error. For the fast append method, which updates the index after all observations are added, when the maximum is exceeded, the observations that are already appended are canceled. The SAS data set is marked as damaged. In interactive mode, an automatic recovery removes all observations that were appended before the maximum was reached, restores the index and integrity constraints, and resets the damaged flag. In batch mode, no automatic recovery occurs. The file remains marked as damaged.
SORT Procedure
When a SAS data set exceeds the maximum number of observations, the SORT procedure completes the operation with the sorting keys properly ordered by the specified variables. However, the relative order of the observations in the output data set can be affected. The relative output order is determined by the EQUALS or NOEQUALS option in the PROC SORT statement. For observations with identical BY variable values, EQUALS maintains the relative order of the observations from the input data set to the output data set. NOEQUALS does not necessarily preserve this order in the output data set. EQUALS is the default.
The relative output order depends on a sequence number, which represents the observation number. The sequence number is stored as a signed long integer at the end of each key. If a SAS data set exceeds the maximum number of observations, the EQUALS option can be disabled depending on the SAS version.
SAS Views
If a SAS data file from which the PROC SQL view is derived has an index or integrity constraint, then the PROC SQL view cannot exceed the maximum number of observations that are allowed for the operating environment.
Delete the Index or Integrity Constraint
If you encounter an exceeded maximum number of observations and the data set has an index or an integrity constraint that uses an index, you can delete the index or the integrity constraint and continue processing. However, you still incur the limited functionality that occurs when the SAS data set exceeds the maximum number of observations.
You can use the DATASETS procedure or the SQL procedure to delete indexes and integrity constraints.
Recreate the SAS Data Set
If you want to retain your index or integrity constraint, you must recreate the SAS data set by dividing it into two or more SAS data sets.
Consider the SAS® Scalable Performance Data Engine
The SAS Scalable Performance Data (SPD) Engine, which is designed for high-performance data delivery, is an alternative for processing very large data sets. The SPD Engine reads and writes data sets that contain billions of observations. The SPD Engine can deliver data to applications rapidly because it organizes the data into a streamlined file format. For example, the engine reads partitioned data sets, which enables it to use multiple CPUs to perform parallel I/O functions. See SAS Scalable Performance Data Engine: Reference.
Migrate to a 64-Bit Operating Environment
If a 32-bit operating environment reaches the maximum, consider migrating to a 64-bit operating environment that stores integers using the 64-bit data model. For SAS 9.2, the environments include HP UX on Itanium 64-bit platform, HP UX on 64-bit platform, Linux on x64 64-bit platform, AIX UNIX on 64-bit RS/6000, Solaris on SPARC 64-bit platform, Solaris on x64 64-bit platform, and OpenVMS for HP Integrity servers 64-bit platform.
Product Family | Product | System | SAS Release | |
Reported | Fixed* | |||
SAS System | Base SAS | Solaris for x64 | 9.2 TS2M0 | 9.3 TS1M0 |
OpenVMS on HP Integrity | 9.2 TS2M0 | 9.3 TS1M0 | ||
Linux for x64 | 9.2 TS2M0 | 9.3 TS1M0 | ||
Linux | 9.2 TS2M0 | 9.3 TS1M0 | ||
HP-UX IPF | 9.2 TS2M0 | 9.3 TS1M0 | ||
64-bit Enabled Solaris | 9.2 TS2M0 | 9.3 TS1M0 | ||
64-bit Enabled HP-UX | 9.2 TS2M0 | 9.3 TS1M0 | ||
64-bit Enabled AIX | 9.2 TS2M0 | 9.3 TS1M0 | ||
Windows Vista for x64 | 9.2 TS2M0 | 9.3 TS1M0 | ||
Windows Vista | 9.2 TS2M0 | 9.3 TS1M0 | ||
Microsoft Windows XP Professional | 9.2 TS2M0 | 9.3 TS1M0 | ||
Microsoft Windows Server 2008 for x64 | 9.2 TS2M0 | 9.3 TS1M0 | ||
Microsoft Windows Server 2003 for x64 | 9.2 TS2M0 | 9.3 TS1M0 | ||
Microsoft Windows Server 2003 Standard Edition | 9.2 TS2M0 | 9.3 TS1M0 | ||
Microsoft Windows Server 2003 Enterprise Edition | 9.2 TS2M0 | 9.3 TS1M0 | ||
Microsoft Windows Server 2003 Datacenter Edition | 9.2 TS2M0 | 9.3 TS1M0 | ||
Microsoft® Windows® for x64 | 9.2 TS2M0 | 9.3 TS1M0 | ||
Microsoft Windows XP 64-bit Edition | 9.2 TS2M0 | 9.3 TS1M0 | ||
Microsoft Windows Server 2003 Enterprise 64-bit Edition | 9.2 TS2M0 | 9.3 TS1M0 | ||
Microsoft Windows Server 2003 Datacenter 64-bit Edition | 9.2 TS2M0 | 9.3 TS1M0 | ||
Microsoft® Windows® for 64-Bit Itanium-based Systems | 9.2 TS2M0 | 9.3 TS1M0 | ||
z/OS | 9.2 TS2M0 | 9.3 TS1M0 |
Type: | Usage Note |
Priority: | |
Topic: | SAS Reference ==> DATA Step Data Management ==> Access ==> SAS I/O Common Programming Tasks ==> Improving Performance |
Date Modified: | 2012-04-02 13:14:13 |
Date Created: | 2009-06-01 15:48:08 |