Understanding Generation Data Sets

Definition of Generation Data Set

A generation data set is an archived version of a SAS data set that is stored as part of a generation group. A generation data set is created each time the file is replaced. Each generation data set in a generation group has the same root member name, but each has a different version number. The most recent version of the generation data set is called the base version.
You can request generations for a SAS data file only. You cannot request generations for a SAS view.
Note: Generation data sets provide historical versions of a data set; they do not track observation updates for an individual data set. To log each time an observation is added, deleted, or updated, see Understanding an Audit Trail.

Terminology for Generation Data Sets

The following terms are relevant to generation data sets:
base version
is the most recently created version of a data set. Its name does not have the four-character suffix for the generation number.
generation group
is a group of data sets that represent a series of replacements to the original data set. The generation group consists of the base version and a set of historical versions.
generation number
is a monotonically increasing number that identifies one of the historical versions in a generation group. For example, the data set named AIR#272 has a generation number of 272.
GENMAX=
is an output data set option that requests generations for a data set and specifies the maximum number of versions (including the base version and all historical versions) to keep for a given data set. The default is GENMAX=0, which means that the generation data sets feature is not in effect.
GENNUM=
is an input data set option that references a specific version from a generation group. Positive numbers are absolute references to a historical version by its generation number. Negative numbers are a relative reference to historical versions. For example, GENNUM=-1 refers to the youngest version.
historical versions
are the older copies of the base version of a data set. Names for historical versions have a four-character suffix for the generation number, such as #003.
oldest version
is the oldest version in a generation group.
rolling over
specifies the process of the version number moving from 999 to 000. When the generation number reaches 999, its next value is 000.
youngest version
is the version that is chronologically closest to the base version.

Invoking Generation Data Sets

To invoke generation data sets and to specify the maximum number of versions to maintain, include the output data set option GENMAX= when creating or replacing a data set. For example, the following DATA step creates a new data set and requests that up to four copies be kept (one base version and three historical versions):
data  (genmax=4);
   x=1;
   output;
run;
Once the GENMAX= data set option is in effect, the data set member name is limited to 28 characters (rather than 32).This happens because the last four characters are reserved for a version number. When the GENMAX= data set option is not in effect, the member name can be up to 32 characters. See the GENMAX= data set option in SAS Data Set Options: Reference.

Understanding How a Generation Group Is Maintained

The first time a data set with generations in effect is replaced, SAS keeps the replaced data set, and appends a four-character version number to its member name, which includes # and a three-digit number. That is, for a data set named A, the replaced data set becomes A#001. When the data set is replaced for the second time, the replaced data set becomes A#002. That is, A#002 is the version that is chronologically closest to the base version. After three replacements, the result is:
A
base (current) version
A#003
most recent (youngest) historical version
A#002
second most recent historical version
A#001
oldest historical version
With GENMAX=4, a fourth replacement deletes the oldest version, which is A#001. As replacements occur, SAS always keeps four copies. For example, after ten replacements, the result is:
A
base (current) version
A#010
most recent (youngest) historical version
A#009
2nd most recent historical version
A#008
oldest historical version
The limit for version numbers that SAS can append is #999. After 999 replacements, the youngest version is #999. After 1,000 replacements, SAS rolls over the youngest version number to #000. After 1,001 replacements, the youngest version number is #001. For example, using data set A with GENNUM=4, the results would be:
999 replacements
  • A (current)
  • A#999 (most recent)
  • A#998 (2nd most recent)
  • A#997 (oldest)
1,000 replacements
  • A (current)
  • A#000 (most recent)
  • A#999 (2nd most recent)
  • A#998 (oldest)
1,001 replacements
  • A (current)
  • A#001 (most recent)
  • A#000 (2nd most recent)
  • A#999 (oldest)
The following table shows how names are assigned to a generation group:
Naming Generation Group Data Sets
Time
SAS Code
Data Set Names
GENNUM= Absolute Reference
GENNUM= Relative Reference
Explanation
1
data air (genmax=3);
AIR
1
0
The AIR data set is created, and three generations are requested.
2
data air;
AIR
AIR#001
2
1
0
-1
AIR is replaced. AIR from time 1 is renamed AIR#001.
3
data air;
AIR
AIR#002
AIR#001
3
2
1
0
-1
-2
AIR is replaced. AIR from time 2 is renamed AIR#002.
4
data air;
AIR
AIR#003
AIR#002
4
3
2
0
-1
-2
AIR is replaced. AIR from time 3 is renamed AIR#003. AIR#001 from time 1, which is the oldest, is deleted.
5
data air (genmax=2);
AIR
AIR#004
5
4
0
-1
AIR is replaced, and the number of generations is changed to two. AIR from time 4 is renamed AIR#004. The two oldest versions are deleted.

Processing Specific Versions of a Generation Group

When a generation group exists, SAS processes the base version by default. For example, the following PRINT procedure prints the base version:
proc print data=a;
run; 
To request a specific version from a generation group, use the GENNUM= input data set option. There are two methods that you can use:
  • A positive integer (excluding zero) references a specific historical version number. For example, the following statement prints the historical version #003:
    proc print data=a(gennum=3);
    run;
    Note: After 1,000 replacements, if you want historical version #000, specify GENNUM=1000.
  • A negative integer is a relative reference to a version in relation to the base version, from the youngest predecessor to the oldest. For example, GENNUM=-1 refers to the youngest version. The following statement prints the data set that is three versions previous to the base version:
    proc print data=a(gennum=-3);
    run;
Requesting Specific Generation Data Sets
SAS Statement
Result
proc print data=air (gennum=0);
proc print data=air;
Prints the current (base) version of the AIR data set.
proc print data=air (gennum=-2);
Prints the version two generations back from the current version.
proc print data=air (gennum=3);
Prints the file AIR#003.
proc print data=air (gennum=1000); 
After 1,000 replacements, prints the file AIR#000, which is the file that is created after AIR#999.

Managing Generation Groups

Introduction

The DATASETS procedure provides a variety of statements for managing generation groups. Note that for the DATASETS procedure, GENNUM= has the following additional functionality:
  • For the PROC DATASETS and DELETE statements, GENNUM= supports the additional values ALL, HIST, and REVERT.
  • For the CHANGE statement, GENNUM= supports the additional value ALL.
  • For the CHANGE statement, specifying GENNUM=0 refers to all versions rather than just the base version.

Displaying Data Set Information

A variety of statements in the DATASETS procedure can process a specific historical version. For example, you can display data set version numbers for historical copies using the CONTENTS statement in PROC DATASETS:
proc datasets library=myfiles;
   contents data=test (gennum=2);
run;

Copying Generation Groups

You can use the COPY statement in the DATASETS procedure or the COPY procedure to copy a generation group. However, you cannot copy an individual version.
For example, the following DATASETS procedure uses the COPY statement to copy a generation group for data set MYGEN1 from library MYLIB1 to library MYLIB2.
libname mylib1 'SAS-library-1';
libname mylib2 'SAS-library-2';

proc datasets;
  copy in=mylib1 out=mylib2;
  select mygen1;
run;

Appending Generation Groups

You can use the GENNUM= data set option to append a specific historical version. For example, the following DATASETS procedure uses the APPEND statement to append a historical version of data set B to data set A. Note that by default, SAS uses the base version for the BASE= data set.
proc datasets; 
   append base=a data=b(gennum=2);
run;

Modifying the Number of Versions

When you modify the attributes of a data set, you can increase or decrease the number of versions for an existing generation group.
For example, the following MODIFY statement in the DATASETS procedure changes the number of generations for data set MYLIB.AIR to 4:
libname mylib 'SAS-library';

proc datasets library=mylib;
   modify air(genmax=4);
run;
CAUTION:
You cannot increase the number of versions after a generation group has rolled over.
To increase the number of versions for a generation group that has rolled over, save the generation group, and then create a new generation data set. Specify the desired maximum number of versions to maintain.
CAUTION:
If you decrease the number of versions, SAS deletes the oldest version or versions so as not to exceed the new maximum.
For example, the following MODIFY statement decreases the number of versions for MYLIB.AIR from 4 to 0. This decrease causes SAS to automatically delete the three historical versions:
proc datasets library=mylib;
   modify air (genmax=0);
run;

Deleting Versions in a Generation Group

When you delete data sets, you can specify a specific version or an entire generation group to delete. The following table shows the types of Delete operations and their effects when you delete versions of a generation group.
The following examples assume that the base version of AIR and two historical versions (AIR#001 and AIR#002) exist for each command.
Deleting Generation Data Sets
SAS Statement in PROC DATASETS
Results
delete air;
delete air(gennum=0); 
Deletes the base version and shifts up historical versions. AIR#002 is renamed to AIR and becomes the new base version.
delete air(gennum=2); 
Deletes historical version AIR#002.
delete air(gennum=-2); 
Deletes the second youngest historical version (AIR#001).
delete air(gennum=all);
Deletes all data sets in the generation group, including the base version.
delete air(gennum=hist);
Deletes all data sets in the generation group, except the base version.
Note: Both an absolute reference and a relative reference refer to a specific version. A relative reference does not skip deleted versions. Therefore, when you are working with a generation group that includes one or more deleted versions, using a relative reference results in an error if the referenced version has been deleted. For example, if you have the base version AIR and three historical versions (AIR#001, AIR#002, and AIR#003) and you delete AIR#002, the following statements return an error, because AIR#002 does not exist. SAS does not assume that you mean AIR#003:
proc print data=air (gennum= -2);
run;

Renaming Versions in a Generation Group

When you rename a data set, you can rename an entire generation group:
proc datasets;
   change a=newa;
run;
You can also rename a single version by including GENNUM=:
proc datasets;
   change a(gennum=2)=newa;
Note: For the CHANGE statement in PROC DATASETS, specifying GENNUM=0 refers to the entire generation group.

Using Passwords in a Generation Group

Passwords for versions in a generation group are maintained as follows:
  • If you assign a password to the base version, the password is maintained in subsequent historical versions. However, the password is not applied to any existing historical versions.
  • If you assign a password to a historical version, the password applies to that individual data set only.