Using SAS Files |
Estimating the Amount of Disk Space for a Data Set |
To obtain an estimate of how much space you need for a disk-format SAS data set that was created by the V9 engine, follow these steps:
Note: This procedure is valid only for uncompressed native SAS data files that were created with the V9 engine.
Use the CONTENTS procedure to determine the size of each observation. (See Determining Observation Length with PROC CONTENTS.)
Multiply the size of each observation by the number of observations.
To determine the length of each observation in a SAS data set, you can create a SAS data set that contains one observation. Then, run the CONTENTS procedure to determine the observation length. The following program produces a SAS data set and PROC CONTENTS output:
data oranges; input variety $ flavor texture looks; total=flavor+texture+looks; datalines; navel 9 8 6 ; proc contents data=oranges; run;
The following is the output:
The CONTENTS Procedure Data Set Name WORK.ORANGES Observations 1 Member Type DATA Variables 5 Engine V9 Indexes 0 Created Monday, May 12, 2008 01:46:21 Observation Length 40 Last Modified Monday, May 12, 2008 01:46:21 Deleted Observations 0 Protection Compressed NO Data Set Type Sorted NO Label Data Representation VMS_IA64, ALPHA_VMS_64 Encoding latin1 Western (ISO) Engine/Host Dependent Information Data Set Page Size 8192 Number of Data Set Pages 1 First Data Page 1 Max Obs per Page 203 Obs in First Data Page 1 Number of Data Set Repairs 0 Filename SASDISK:[SASDEMO.SAS$WORK2040F93A]ORANGES.SAS7BDAT Release Created 9.0201B0 Host Created OpenVMS File Size (blocks) 17 Alphabetic List of Variables and Attributes # Variable Type Len 2 flavor Num 8 4 looks Num 8 3 texture Num 8 5 total Num 8 1 variety Char 8
To determine observation length, the only values that you need to pay attention to are the following
has the value NO if records are not compressed, and either CHAR or BINARY if records are compressed. If the records are compressed, do not use the procedure given in Estimating the Size of a SAS Data Set under OpenVMS.
For an explanation of the CHAR and BINARY values, see "COMPRESS System Option" in SAS Language Reference: Dictionary. For more information about compressing data files, see SAS Language Reference: Concepts.
Optimizing Page Size |
The procedure output shown in CONTENTS Procedure Output provides values for the physical characteristics of Work.Oranges that are useful when selecting an optimal page size. Some values, such as the page size and the number of observations per page for uncompressed SAS data sets, are Engine/Host Dependent Information . To determine the optimal page size for a data set, the following values are important:
is the number of observations in the data set that have not been deleted or flagged for deletion.
has the value NO if records are not compressed, and either CHAR or BINARY if records are compressed.
For an explanation of the CHAR and BINARY values, see "COMPRESS System Option" in SAS Language Reference: Dictionary. For more information about compressing data files, see SAS Language Reference: Concepts.
is the page number of the page containing the first observation for noncompressed files. Descriptor information is stored before the observations in the file.
is the maximum number of observations a page can hold for noncompressed files.
is the number of observations in the first page for noncompressed files.
Note: First Data Page, Max Obs per Page, and Obs in First Data Page are provided only by the CONTENTS procedure for a noncompressed data set. These values have little meaning for a compressed data set because each observation could be a different size.
The following values change based on the number of observations in the data set:
Obs in First Data Page. This value changes until there are enough observations to fill the first data page.
Using the CONTENTS procedure information, you can experiment with default page size and various BUFSIZE= values to select an optimal page size--one that optimizes your most valuable resource. For example, if you want to maximize the number of I/Os performed on a data set, increase the BUFSIZE= value. This increases the Max Obs per Page value given by the CONTENTS procedure. However, increasing the buffer size does not maximize use of disk space and is probably only useful for large data sets, where performance is an important issue.
For smaller data sets, you might want to optimize your use of disk space rather than the number of I/Os performed. For example, if you run the DATA step used earlier with a BUFSIZE= value of 512, the data set takes up only 6 disk blocks instead of 18. For a small data set, this is more efficient because the number of I/Os is not a significant factor.
When varying the BUFSIZE= value, decide which computer resources you want to optimize first, then experiment until you get the result you want.
Copyright © 2009 by SAS Institute Inc., Cary, NC, USA. All rights reserved.