SAS Data Files |
Definition of Generation Data Set |
A generation data set is an archived version of a SAS data set that is stored as part of a generation group. A generation data set is created each time the file is replaced. Each generation data set in a generation group has the same root member name, but each has a different version number. The most recent version of the generation data set is called the base version.
You can request generations for a SAS data file only. You cannot request generations for a SAS view.
Note: Generation data sets provide historical versions of a data set; they do not track observation updates for an individual data set. To log each time an observation is added, deleted, or updated, see Understanding an Audit Trail.
Terminology for Generation Data Sets |
The following terms are relevant to generation data sets:
is the most recently created version of a data set. Its name does not have the four-character suffix for the generation number.
is a group of data sets that represent a series of replacements to the original data set. The generation group consists of the base version and a set of historical versions.
is a monotonically increasing number that identifies one of the historical versions in a generation group. For example, the data set named AIR#272 has a generation number of 272.
is an output data set option that requests generations for a data set and specifies the maximum number of versions (including the base version and all historical versions) to keep for a given data set. The default is GENMAX=0, which means that the generation data sets feature is not in effect.
is an input data set option that references a specific version from a generation group. Positive numbers are absolute references to a historical version by its generation number. Negative numbers are a relative reference to historical versions. For example, GENNUM=-1 refers to the youngest version.
are the older copies of the base version of a data set. Names for historical versions have a four-character suffix for the generation number, such as #003.
specifies the process of the version number moving from 999 to 000. When the generation number reaches 999, its next value is 000.
is the version that is chronologically closest to the base version.
Invoking Generation Data Sets |
To invoke generation data sets and to specify the maximum number of versions to maintain, include the output data set option GENMAX= when creating or replacing a data set. For example, the following DATA step creates a new data set and requests that up to four copies be kept (one base version and three historical versions):
data a (genmax=4); x=1; output; run;
Once the GENMAX= data set option is in effect, the data set member name is limited to 28 characters (rather than 32), because the last four characters are reserved for a version number. When the GENMAX= data set option is not in effect, the member name can be up to 32 characters. See the GENMAX= data set option in SAS Language Reference: Dictionary.
Understanding How a Generation Group Is Maintained |
The first time a data set with generations in effect is replaced, SAS keeps the replaced data set and appends a four-character version number to its member name, which includes # and a three-digit number. That is, for a data set named A, the replaced data set becomes A#001. When the data set is replaced for the second time, the replaced data set becomes A#002; that is, A#002 is the version that is chronologically closest to the base version. After three replacements, the result is:
A | |
A#003 | |
A#002 | |
A#001 |
With GENMAX=4, a fourth replacement deletes the oldest version, which is A#001. As replacements occur, SAS always keeps four copies. For example, after ten replacements, the result is:
A | |
A#010 | |
A#009 | |
A#008 |
The limit for version numbers that SAS can append is #999. After 999 replacements, the youngest version is #999. After 1,000 replacements, SAS rolls over the youngest version number to #000. After 1,001 replacements, the youngest version number is #001. For example, using data set A with GENNUM=4, the results would be:
999 replacements |
| ||||||||
1,000 replacements |
| ||||||||
1,001 replacements |
|
The following table shows how names are assigned to a generation group:
Time | SAS Code | Data Set Names | GENNUM= Absolute Reference | GENNUM= Relative Reference | Explanation |
---|---|---|---|---|---|
1 | data air (genmax=3); | AIR | 1 | 0 | The AIR data set is created, and three generations are requested |
2 | data air; | AIR | 2 | 0 | AIR is replaced. AIR from time 1 is renamed AIR#001. |
3 | data air; | AIR | 3 | 0 | AIR is replaced. AIR from time 2 is renamed AIR#002. |
4 | data air; | AIR | 4 | 0 | AIR is replaced. AIR from time 3 is renamed AIR#003. AIR#001 from time 1, which is the oldest, is deleted. |
5 | data air (genmax=2); | AIR | 5 | 0 | AIR is replaced, and the number of generations is changed to two. AIR from time 4 is renamed AIR#004. The two oldest versions are deleted. |
Processing Specific Versions of a Generation Group |
When a generation group exists, SAS processes the base version by default. For example, the following PRINT procedure prints the base version:
proc print data=a; run;
To request a specific version from a generation group, use the GENNUM= input data set option. There are two methods that you can use:
A positive integer (excluding zero) references a specific historical version number. For example, the following statement prints the historical version #003:
proc print data=a(gennum=3); run;
Note: After 1,000 replacements, if you want historical version #000, specify GENNUM=1000.
A negative integer is a relative reference to a version in relation to the base version, from the youngest predecessor to the oldest. For example, GENNUM=-1 refers to the youngest version. The following statement prints the data set that is three versions previous to the base version:
proc print data=a(gennum=-3); run;
SAS statement | Result | |
---|---|---|
proc print data=air (gennum=0); proc print data=air; |
Prints the current (base) version of the AIR data set. | |
proc print data=air (gennum=-2); |
Prints the version two generations back from the current version. | |
proc print data=air (gennum=3); |
Prints the file AIR#003. | |
proc print data=air (gennum=1000); |
After 1,000 replacements, prints the file AIR#000, which is the file that is created after AIR#999. |
Managing Generation Groups |
The DATASETS procedure provides a variety of statements for managing generation groups. Note that for the DATASETS procedure, GENNUM= has the following additional functionality:
For the PROC DATASETS and DELETE statements, GENNUM= supports the additional values ALL, HIST, and REVERT.
For the CHANGE statement, GENNUM= supports the additional value ALL.
For the CHANGE statement, specifying GENNUM=0 refers to all versions rather than just the base version.
A variety of statements in the DATASETS procedure can process a specific historical version. For example, you can display data set version numbers for historical copies using the CONTENTS statement in PROC DATASETS:
proc datasets library=myfiles; contents data=test (gennum=2); run;
You can use the COPY statement in the DATASETS procedure or the COPY procedure to copy a generation group. However, you cannot copy an individual version.
For example, the following DATASETS procedure uses the COPY statement to copy a generation group for data set MYGEN1 from library MYLIB1 to library MYLIB2.
libname mylib1 'SAS-library-1'; libname mylib2 'SAS-library-2'; proc datasets; copy in=mylib1 out=mylib2; select mygen1; run;
You can use the GENNUM= data set option to append a specific historical version. For example, the following DATASETS procedure uses the APPEND statement to append a historical version of data set B to data set A. Note that by default, SAS uses the base version for the BASE= data set.
proc datasets; append base=a data=b(gennum=2); run;
When you modify the attributes of a data set, you can increase or decrease the number of versions for an existing generation group.
For example, the following MODIFY statement in the DATASETS procedure changes the number of generations for data set MYLIB.AIR to 4:
libname mylib 'SAS-library'; proc datasets library=mylib; modify air(genmax=4); run;
To increase the number of versions for a generation group that has rolled over, save the generation group, and then create a new generation data set and specify the desired maximum number of versions to maintain.
For example, the following MODIFY statement decreases the number of versions for MYLIB.AIR from 4 to 0, which causes SAS to automatically delete the three historical versions:
proc datasets library=mylib; modify air (genmax=0); run;
When you delete data sets, you can specify a specific version or an entire generation group to delete. The following table shows the types of delete operations and their effects when you delete versions of a generation group.
The following examples assume that the base version of AIR and two historical versions (AIR#001 and AIR#002) exist for each command.
Note: Both an absolute reference and a relative reference refer to a specific version. A relative reference does not skip deleted versions. Therefore, when you are working with a generation group that includes one or more deleted versions, using a relative reference results in an error if the referenced version has been deleted. For example, if you have the base version AIR and three historical versions (AIR#001, AIR#002, and AIR#003) and you delete AIR#002, the following statements return an error, because AIR#002 does not exist. SAS does not assume that you mean AIR#003:
proc print data=air (gennum= -2); run;
When you rename a data set, you can rename an entire generation group:
proc datasets; change a=newa; run;
You can also rename a single version by including GENNUM=:
proc datasets; change a(gennum=2)=newa;
Note: For the CHANGE statement in PROC DATASETS, specifying GENNUM=0 refers to the entire generation group.
Passwords for versions in a generation group are maintained as follows:
If you assign a password to the base version, the password is maintained in subsequent historical versions. However, the password is not applied to any existing historical versions.
If you assign a password to a historical version, the password applies to that individual data set only.
Copyright © 2010 by SAS Institute Inc., Cary, NC, USA. All rights reserved.