Previous Page | Next Page

Using SAS Files

Estimating the Size of a SAS Data Set under OpenVMS


Estimating the Amount of Disk Space for a Data Set

To obtain an estimate of how much space you need for a disk-format SAS data set that was created by the V9 engine, follow these steps:

Note:   This procedure is valid only for uncompressed native SAS data files that were created with the V9 engine.  [cautionend]

  1. Use the CONTENTS procedure to determine the size of each observation. (See Determining Observation Length with PROC CONTENTS.)

  2. Multiply the size of each observation by the number of observations.

  3. Add 10 percent for overhead.


Determining Observation Length with PROC CONTENTS

To determine the length of each observation in a SAS data set, you can create a SAS data set that contains one observation. Then, run the CONTENTS procedure to determine the observation length. The following program produces a SAS data set and PROC CONTENTS output:

data oranges;
   input variety $ flavor texture looks;
   total=flavor+texture+looks;
   datalines;
navel 9 8 6
;
proc contents data=oranges;
run;

The following is the output:

CONTENTS Procedure Output

                          The CONTENTS Procedure

Data Set Name        WORK.ORANGES                       Observations         1
Member Type          DATA                               Variables            5
Engine               V9                                 Indexes              0
Created              Monday, May 12, 2008 01:46:21      Observation Length   40
Last Modified        Monday, May 12, 2008 01:46:21      Deleted Observations 0
Protection                                              Compressed           NO
Data Set Type                                           Sorted               NO
Label
Data Representation VMS_IA64, ALPHA_VMS_64
Encoding            latin1  Western (ISO)

                   Engine/Host Dependent Information

  Data Set Page Size          8192
  Number of Data Set Pages    1
  First Data Page             1
  Max Obs per Page            203
  Obs in First Data Page      1
  Number of Data Set Repairs  0
  Filename                    SASDISK:[SASDEMO.SAS$WORK2040F93A]ORANGES.SAS7BDAT
  Release Created             9.0201B0
  Host Created                OpenVMS
  File Size (blocks)          17


               Alphabetic List of Variables and Attributes

                    #    Variable    Type    Len
                    
                    2    flavor      Num       8
                    4    looks       Num       8
                    3    texture     Num       8
                    5    total       Num       8
                    1    variety     Char      8

To determine observation length, the only values that you need to pay attention to are the following

Observation Length

is the record size in bytes.

Compressed

has the value NO if records are not compressed, and either CHAR or BINARY if records are compressed. If the records are compressed, do not use the procedure given in Estimating the Size of a SAS Data Set under OpenVMS.

For an explanation of the CHAR and BINARY values, see "COMPRESS System Option" in SAS Language Reference: Dictionary. For more information about compressing data files, see SAS Language Reference: Concepts.


Optimizing Page Size

The procedure output shown in CONTENTS Procedure Output provides values for the physical characteristics of Work.Oranges that are useful when selecting an optimal page size. Some values, such as the page size and the number of observations per page for uncompressed SAS data sets, are Engine/Host Dependent Information . To determine the optimal page size for a data set, the following values are important:

Observations

is the number of observations in the data set that have not been deleted or flagged for deletion.

Observation Length

is the record size in bytes.

Compressed

has the value NO if records are not compressed, and either CHAR or BINARY if records are compressed.

For an explanation of the CHAR and BINARY values, see "COMPRESS System Option" in SAS Language Reference: Dictionary. For more information about compressing data files, see SAS Language Reference: Concepts.

Data Set Page Size

is the page size, expressed in bytes.

Number of Data Set Pages

is the number of pages for the data set.

First Data Page

is the page number of the page containing the first observation for noncompressed files. Descriptor information is stored before the observations in the file.

Max Obs per Page

is the maximum number of observations a page can hold for noncompressed files.

Obs in First Data Page

is the number of observations in the first page for noncompressed files.

Note:   First Data Page, Max Obs per Page, and Obs in First Data Page are provided only by the CONTENTS procedure for a noncompressed data set. These values have little meaning for a compressed data set because each observation could be a different size.  [cautionend]

The following values change based on the number of observations in the data set:

For a single page size, the other values do not change.

Experimenting with Buffer Size to Set an Optimal Page Size

Using the CONTENTS procedure information, you can experiment with default page size and various BUFSIZE= values to select an optimal page size--one that optimizes your most valuable resource. For example, if you want to maximize the number of I/Os performed on a data set, increase the BUFSIZE= value. This increases the Max Obs per Page value given by the CONTENTS procedure. However, increasing the buffer size does not maximize use of disk space and is probably only useful for large data sets, where performance is an important issue.

For smaller data sets, you might want to optimize your use of disk space rather than the number of I/Os performed. For example, if you run the DATA step used earlier with a BUFSIZE= value of 512, the data set takes up only 6 disk blocks instead of 18. For a small data set, this is more efficient because the number of I/Os is not a significant factor.

When varying the BUFSIZE= value, decide which computer resources you want to optimize first, then experiment until you get the result you want.

Previous Page | Next Page | Top of Page