DATASETS Procedure

CONTENTS Statement

Describes the contents of one or more SAS data sets and prints the directory of the SAS library.
Restriction: You cannot use the WHERE option to affect the output because PROC CONTENTS does not process any observations.
Tip: You can use data set options with the DATA=, OUT=, and OUT2= options. You can use any global statements as well.
Describing a SAS Data Set

Syntax

Summary of Optional Arguments

prints centiles information for indexed variables.
specifies the input data set.
includes information in the output about the number of observations, number of variables, number of indexes, and data set labels.
prints a list of the SAS files in the SAS library.
prints the length of a variable's informat or format.
restricts processing to one or more types of SAS files.
see the description of DETAILS | NODETAILS.
suppresses the printing of individual files.
suppresses the printing of the output.
prints a list of variables in a specified order.
specifies the name for an output data set.
specifies the name of an output data set to contain information about indexes and integrity constraints.
prints abbreviated output.
print a list of the variables by their position in the data set. By default, the CONTENTS statement lists the variables alphabetically.

Optional Arguments

CENTILES
prints centiles information for indexed variables.
The following additional fields are printed in the default report of PROC CONTENTS when the CENTILES option is selected and an index exists on the data set. Note that the additional fields depend on whether the index is simple or complex.
#
number of the index on the data set.
Index
name of the index.
Update Centiles
percentage of the data values that must be changed before the CENTILES for the indexed variables are automatically updated.
Current Update Percentage
percentage of index updated since CENTILES were refreshed.
# of Unique Values
number of unique indexed values.
Variables
names of the variables used to make up the index. Centile information is listed below the variables.
DATA=SAS-file-specification
specifies an entire library or a specific SAS data set within a library. SAS-file-specification can take one of the following forms:
<libref.>SAS-data-set
names one SAS data set to process. The default for libref is the libref of the procedure input library. For example, to obtain the contents of the SAS data set HTWT from the procedure input library, use the following CONTENTS statement:
contents data=HtWt;
To obtain the contents of a specific version from a generation group, use the GENNUM= data set option as shown in the following CONTENTS statement:
contents data=HtWt(gennum=3);
<libref.>_ALL_
gives you information about all SAS data sets that have the type or types specified by the MEMTYPE= option. libref refers to the SAS library. The default for libref is the libref of the procedure input library.
  • If you are using the _ALL_ keyword, you need Read access to all read-protected SAS data sets in the SAS library.
  • DATA=_ALL_ automatically prints a listing of the SAS files that are contained in the SAS library. Note that for SAS views, all librefs that are associated with the views must be assigned in the current session in order for them to be processed for the listing.
Default:most recently created data set in your job or session, from any SAS library.
Tip:If you specify a read-protected data set in the DATA= option but do not give the Read password, by default the procedure looks in the PROC DATASETS statement for the Read password. However, if you do not specify the DATA= option and the default data set (last one created in the session) is read protected, the procedure does not look in the PROC DATASETS statement for the Read password.
DETAILS | NODETAILS
DETAILS includes these additional columns of information in the output, but only if DIRECTORY is also specified.
Default:If neither DETAILS nor NODETAILS is specified, the defaults are as follows: for the CONTENTS procedure, the default is the system option setting, which is NODETAILS; for the CONTENTS statement, the default is whatever is specified in the PROC DATASETS statement, which also defaults to the system option setting.
See:a description of the additional columns in the Optional Argument section of PROC DATASETS Statement
DIRECTORY
prints a list of all SAS files in the specified SAS library. If DETAILS is also specified, using DIRECTORY causes the additional columns described in DETAILS | NODETAILS to be printed.
FMTLEN
prints the length of the informat or format. If you do not specify a length for the informat or format when you associate it with a variable, the length does not appear in the output of the CONTENTS statement unless you use the FMTLEN option. The length also appears in the FORMATL or INFORML variable in the output data set.
MEMTYPE=(mtype-1 <...mtype-n>)
restricts processing to one or more member types. The CONTENTS statement produces output only for member types DATA, VIEW, and ALL, which includes DATA and VIEW.
MEMTYPE= in the CONTENTS statement differs from MEMTYPE= in most of the other statements in the DATASETS procedure in the following ways:
  • A slash does not precede the option.
  • You cannot enclose the MEMTYPE= option in parentheses to limit its effect to only the SAS file immediately preceding it.
MEMTYPE= results in a directory of the library in which the DATA= member is located. However, MEMTYPE= does not limit the types of members whose contents are displayed unless the _ALL_ keyword is used in the DATA= option. For example, the following statements produce the contents of only the SAS data sets with the member type DATA:
proc datasets memtype=data;    
   contents data=_all_; 
run;
Alias:MT=, MTYPE=
Default:DATA
NODS
suppresses printing the contents of individual files when you specify _ALL_ in the DATA= option. The CONTENTS statement prints only the SAS library directory. You cannot use the NODS option when you specify only one SAS data set in the DATA= option.
NODETAILS
See DETAILS | NODETAILS.
NOPRINT
suppresses printing the output of the CONTENTS statement.
ORDER= COLLATE | CASECOLLATE | IGNORECASE | VARNUM
COLLATE
prints a list of variables in alphabetical order beginning with uppercase and then lowercase names.
CASECOLLATE
prints a list of variables in alphabetical order even if they include mixed-case names and numerics.
IGNORECASE
prints a list of variables in alphabetical order ignoring the case of the letters.
VARNUM
is the same as the VARNUM option.
See:VARNUM
Note:The ORDER= option does not affect the order of the OUT= and OUT2= data sets.
Example:See Using the ORDER= Option with the CONTENTS Statement to compare the default and the four options for ORDER=.
OUT=SAS-data-set
names an output SAS data set.
Tip:OUT= does not suppress the printed output from the statement. If you want to suppress the printed output, you must use the NOPRINT option.
See:The OUT= Data Set for a description of the variables in the OUT= data set.ODS Output for an example of how to get the CONTENTS output into an ODS data set for processing.
OUT2=SAS-data-set
names the output data set to contain information about indexes and integrity constraints.
Tips:If UPDATECENTILES was not specified in the index definition, then the default value of 5 is used in the RECREATE variable of the OUT2 data set.

OUT2= does not suppress the printed output from the statement. To suppress the printed output, use the NOPRINT option.

See:The OUT2= Data Set for a description of the variables in the OUT2= data set.
SHORT
prints only the list of variable names, the index information, and the sort information for the SAS data set.
Restriction:If the list of variables is more than 32,767 characters, the list is truncated and a WARNING is written to the SAS log. To get a complete list of the variables, request an alphabetical listing of the variables.
VARNUM
prints a list of the variable names in the order of their logical position in the data set. By default, the CONTENTS statement lists the variables alphabetically. The physical position of the variable in the data set is engine-dependent.

Details

Printing Variables

The CONTENTS statement prints an alphabetical listing of the variables by default, except for variables in the form of a numbered range list. Numbered range lists, such as x1–x100, are printed in incrementing order, that is, x1–x100. For more information, see Alphabetic List of Variables and Attributes.
Note: If a label is changed after a view is created from a data set with variable labels, the CONTENTS or DATASETS procedure output shows the original labels. The view must be recompiled in order for the CONTENTS or DATASETS procedure output to reflect the new variable labels.

Using the CONTENTS Procedure Instead of the CONTENTS Statement

The only difference between the CONTENTS procedure and the CONTENTS statement in PROC DATASETS is the default for libref in the DATA= option. For PROC CONTENTS, the default is WORK. For the CONTENTS statement, the default is the libref of the procedure input library.

Observation Length, Alignment, and Padding for a SAS Data Set

There are three different cases for alignment.
  • Observations within a SAS data set are aligned on double-byte boundaries whenever possible. As a result, 8-byte and 4-byte numeric variables are positioned at 8-byte boundaries at the front of the data set and followed by character variables in the order in which they are encountered. If the data set only contains 4-byte numeric data, the alignment is based on 4-byte boundaries. Since numeric doubles can be operated upon directly rather than being moved and aligned before doing comparisons or increments, the boundaries cause better performance.
    Since there are many observations contained within a given disk data page buffer, there might be padding between observations in order to let each observation be aligned on a double-byte boundary. See the following example:
    data a;
       length aa 7 bb 6 cc $10 dd 8 ee 3;
       aa = 1;
       bb = 2;
       cc = 'abc';
       dd = 3;
       ee = 4;
       ff = 5;
       output;
    run;
    
    proc contents data=a out=a1; 
    run;
    
    proc print data=a1(keep=name length varnum npos); 
    run;
    
    
    NEW PROC CONTENTS RESULTS NEEDED HERE!!!!
                     The SAS System
       
        OBS    NAME    LENGTH    VARNUM    NPOS
       
         1      aa        7        1        16 
         2      bb        6        2        23 
         3      cc       10        3        32 
         4      dd        8        4         0 
         5      ee        3        5        29 
         6      ff        8        6         8 
    
    
    PROC CONTENTS shows an observation length of 48. PROC PRINT displays the internal layout of the variables within the observation where NPOS is the zero-based offset for each variable.
    NEW PROC PRINT HERE.
    Variables DD and FF, the only true numeric doubles, are at offsets 0 and 8, respectively, so they are automatically aligned. The rest of the observation contains the remaining numeric variables and then character variables.
    The last physical variable in this layout is CC with an offset of 32 and a length of 10. This gives you an internal length of 42, even though PROC CONTENTS reports the observation length as 48. The difference is the 6 bytes of padding so that the next observation is aligned on a double-byte boundary within the disk page buffer.
  • No alignment is done when the observation does not contain 8-byte numeric variables as demonstrated in the next example, which gives you an observation length of 7 and no padding between observations within disk page buffers:
    data b;
       length aa 6 cc $1;
       aa = 1;
       cc = 'x';
       output;
    run;
    
    proc contents data=b out=b1;
    run;
    
    proc print data=b1(keep=name length varnum npos); 
    run;
    
    
     The SAS System  
    
            OBS    NAME    LENGTH    VARNUM    NPOS
    
             1      aa       6         1        0 
             2      cc       1         2        6 
    
    
  • Observations for compressed data sets are not aligned within the disk page buffer, but the same algorithm is used for positioning the variables within the observations. Compressed observations must be uncompressed and moved into a work buffer. The 8-byte numeric values will be aligned and ready for use immediately after uncompressing. The observation length in the PROC CONTENTS output might be larger due to operating system-specific overhead.