Previous Page | Next Page

SAS Data Files

Understanding Generation Data Sets


Definition of Generation Data Set

A generation data set is an archived version of a SAS data set that is stored as part of a generation group. A generation data set is created each time the file is replaced. Each generation data set in a generation group has the same root member name, but each has a different version number. The most recent version of the generation data set is called the base version.

You can request generations for a SAS data file only. You cannot request generations for a SAS view.

Note:   Generation data sets provide historical versions of a data set; they do not track observation updates for an individual data set. To log each time an observation is added, deleted, or updated, see Understanding an Audit Trail.  [cautionend]


Terminology for Generation Data Sets

The following terms are relevant to generation data sets:

base version

is the most recently created version of a data set. Its name does not have the four-character suffix for the generation number.

generation group

is a group of data sets that represent a series of replacements to the original data set. The generation group consists of the base version and a set of historical versions.

generation number

is a monotonically increasing number that identifies one of the historical versions in a generation group. For example, the data set named AIR#272 has a generation number of 272.

GENMAX=

is an output data set option that requests generations for a data set and specifies the maximum number of versions (including the base version and all historical versions) to keep for a given data set. The default is GENMAX=0, which means that the generation data sets feature is not in effect.

GENNUM=

is an input data set option that references a specific version from a generation group. Positive numbers are absolute references to a historical version by its generation number. Negative numbers are a relative reference to historical versions. For example, GENNUM=-1 refers to the youngest version.

historical versions

are the older copies of the base version of a data set. Names for historical versions have a four-character suffix for the generation number, such as #003.

oldest version

is the oldest version in a generation group.

rolling over

specifies the process of the version number moving from 999 to 000. When the generation number reaches 999, its next value is 000.

youngest version

is the version that is chronologically closest to the base version.


Invoking Generation Data Sets

To invoke generation data sets and to specify the maximum number of versions to maintain, include the output data set option GENMAX= when creating or replacing a data set. For example, the following DATA step creates a new data set and requests that up to four copies be kept (one base version and three historical versions):

data a (genmax=4);
   x=1;
   output;
run;

Once the GENMAX= data set option is in effect, the data set member name is limited to 28 characters (rather than 32), because the last four characters are reserved for a version number. When the GENMAX= data set option is not in effect, the member name can be up to 32 characters. See the GENMAX= data set option in SAS Language Reference: Dictionary.


Understanding How a Generation Group Is Maintained

The first time a data set with generations in effect is replaced, SAS keeps the replaced data set and appends a four-character version number to its member name, which includes # and a three-digit number. That is, for a data set named A, the replaced data set becomes A#001. When the data set is replaced for the second time, the replaced data set becomes A#002; that is, A#002 is the version that is chronologically closest to the base version. After three replacements, the result is:

A

base (current) version

A#003

most recent (youngest) historical version

A#002

second most recent historical version

A#001

oldest historical version.

With GENMAX=4, a fourth replacement deletes the oldest version, which is A#001. As replacements occur, SAS always keeps four copies. For example, after ten replacements, the result is:

A

base (current) version

A#010

most recent (youngest) historical version

A#009

2nd most recent historical version

A#008

oldest historical version

The limit for version numbers that SAS can append is #999. After 999 replacements, the youngest version is #999. After 1,000 replacements, SAS rolls over the youngest version number to #000. After 1,001 replacements, the youngest version number is #001. For example, using data set A with GENNUM=4, the results would be:

999 replacements

A (current)

A#999 (most recent)

A#998 (2nd most recent)

A#997 (oldest)

1,000 replacements

A (current)

A#000 (most recent)

A#999 (2nd most recent)

A#998 (oldest)

1,001 replacements

A (current)

A#001 (most recent)

A#000 (2nd most recent)

A#999 (oldest)

The following table shows how names are assigned to a generation group:

Naming Generation Group Data Sets
Time SAS Code Data Set Names GENNUM= Absolute Reference GENNUM= Relative Reference Explanation
1 data air (genmax=3); AIR 1 0 The AIR data set is created, and three generations are requested
2 data air; AIR

AIR#001

2

1

0

-1

AIR is replaced. AIR from time 1 is renamed AIR#001.
3 data air; AIR

AIR#002

AIR#001

3

2

1

0

-1

-2

AIR is replaced. AIR from time 2 is renamed AIR#002.
4 data air; AIR

AIR#003

AIR#002

4

3

2

0

-1

-2

AIR is replaced. AIR from time 3 is renamed AIR#003. AIR#001 from time 1, which is the oldest, is deleted.
5 data air (genmax=2); AIR

AIR#004

5

4

0

-1

AIR is replaced, and the number of generations is changed to two. AIR from time 4 is renamed AIR#004. The two oldest versions are deleted.


Processing Specific Versions of a Generation Group

When a generation group exists, SAS processes the base version by default. For example, the following PRINT procedure prints the base version:

proc print data=a;
run; 

To request a specific version from a generation group, use the GENNUM= input data set option. There are two methods that you can use:

Requesting Specific Generation Data Sets
SAS statement Result
proc print data=air (gennum=0);

proc print data=air;
Prints the current (base) version of the AIR data set.
proc print data=air (gennum=-2);
Prints the version two generations back from the current version.
proc print data=air (gennum=3);
Prints the file AIR#003.
proc print data=air (gennum=1000); 
After 1,000 replacements, prints the file AIR#000, which is the file that is created after AIR#999.


Managing Generation Groups


Introduction

The DATASETS procedure provides a variety of statements for managing generation groups. Note that for the DATASETS procedure, GENNUM= has the following additional functionality:


Displaying Data Set Information

A variety of statements in the DATASETS procedure can process a specific historical version. For example, you can display data set version numbers for historical copies using the CONTENTS statement in PROC DATASETS:

proc datasets library=myfiles;
   contents data=test (gennum=2);
run;


Copying Generation Groups

You can use the COPY statement in the DATASETS procedure or the COPY procedure to copy a generation group. However, you cannot copy an individual version.

For example, the following DATASETS procedure uses the COPY statement to copy a generation group for data set MYGEN1 from library MYLIB1 to library MYLIB2.

libname mylib1 'SAS-library-1';
libname mylib2 'SAS-library-2';

proc datasets;
  copy in=mylib1 out=mylib2;
  select mygen1;
run;


Appending Generation Groups

You can use the GENNUM= data set option to append a specific historical version. For example, the following DATASETS procedure uses the APPEND statement to append a historical version of data set B to data set A. Note that by default, SAS uses the base version for the BASE= data set.

proc datasets; 
   append base=a data=b(gennum=2);
run;


Modifying the Number of Versions

When you modify the attributes of a data set, you can increase or decrease the number of versions for an existing generation group.

For example, the following MODIFY statement in the DATASETS procedure changes the number of generations for data set MYLIB.AIR to 4:

libname mylib 'SAS-library';

proc datasets library=mylib;
   modify air(genmax=4);
run;

CAUTION:
You cannot increase the number of versions after a generation group has rolled over.

To increase the number of versions for a generation group that has rolled over, save the generation group, and then create a new generation data set and specify the desired maximum number of versions to maintain.   [cautionend]

CAUTION:
If you decrease the number of versions, SAS deletes the oldest version or versions so as not to exceed the new maximum.

For example, the following MODIFY statement decreases the number of versions for MYLIB.AIR from 4 to 0, which causes SAS to automatically delete the three historical versions:

proc datasets library=mylib;
   modify air (genmax=0);
run;

  [cautionend]


Deleting Versions in a Generation Group

When you delete data sets, you can specify a specific version or an entire generation group to delete. The following table shows the types of delete operations and their effects when you delete versions of a generation group.

The following examples assume that the base version of AIR and two historical versions (AIR#001 and AIR#002) exist for each command.

SAS statement in PROC DATASETS Results
delete air;

delete air(gennum=0); 
Deletes the base version and shifts up historical versions. AIR#002 is renamed to AIR and becomes the new base version.
delete air(gennum=2); 
Deletes historical version AIR#002.
delete air(gennum=-2); 
Deletes the second youngest historical version (AIR#001).
delete air(gennum=all);
Deletes all data sets in the generation group, including the base version.
delete air(gennum=hist);
Deletes all data sets in the generation group, except the base version.

Note:   Both an absolute reference and a relative reference refer to a specific version. A relative reference does not skip deleted versions. Therefore, when you are working with a generation group that includes one or more deleted versions, using a relative reference results in an error if the referenced version has been deleted. For example, if you have the base version AIR and three historical versions (AIR#001, AIR#002, and AIR#003) and you delete AIR#002, the following statements return an error, because AIR#002 does not exist. SAS does not assume that you mean AIR#003:

proc print data=air (gennum= -2);
run;

  [cautionend]


Renaming Versions in a Generation Group

When you rename a data set, you can rename an entire generation group:

proc datasets;
   change a=newa;
run;

You can also rename a single version by including GENNUM=:

proc datasets;
   change a(gennum=2)=newa;

Note:   For the CHANGE statement in PROC DATASETS, specifying GENNUM=0 refers to the entire generation group.  [cautionend]


Using Passwords in a Generation Group

Passwords for versions in a generation group are maintained as follows:

Previous Page | Next Page | Top of Page