Transformations in a SAS
Data Integration Studio job can produce the following types of intermediate
files:
-
procedure utility files that are
created by the SORT and SUMMARY procedures when these procedures are
used in the transformation
-
transformation temporary files
that are created by the transformation as it is working
-
transformation output tables that
are created by the transformation when it produces its result; the
output for a transformation becomes the input to the next transformation
in the flow
By default, procedure
utility files, transformation temporary files, and transformation
output tables are created in the WORK library. You can use the -WORK
invocation option to force all intermediate files to a specified location.
You can use the -UTILLOC invocation option to force only utility files
to a separate location.
Knowledge of intermediate
files helps you to perform the following tasks:
-
View or analyze the output tables
for a transformation and verify that the output is correct.
-
Estimate the disk space that is
needed for intermediate files.
These intermediate files
are usually deleted after they have served their purpose. However,
it is possible that some intermediate files might be retained longer
than desired in a particular process flow. For example, some user-written
transformations might not delete the temporary files that they create.
Utility files are deleted
by the SAS procedure that created them. Transformation temporary files
are deleted by the transformation that created them. When a SAS Data
Integration Studio job is executed in batch, transformation output
tables are deleted when the process flow ends or the current server
session ends.
When a job is executed
interactively in SAS Data Integration Studio, transformation output
tables are retained until the Job Editor window is closed or the current
server session is ended in some other way (for example, by selecting
ActionsStop from the menu. For information about how transformation
output tables can be used to debug the transformations in a job, see
Reviewing Temporary Output Tables. However, as
long as you keep the job open in the Job Editor window, the output
tables remain in the WORK library on the SAS Workspace Server that
executed the job. If this is not what you want, you can manually delete
the output tables, or you can close the Job Editor window and open
it again, which will delete all intermediate files.
Here is a post-processing
macro that can be incorporated into a process flow. It uses the DATASETS
procedure to delete all data sets in the Work library, including any
intermediate files that have been saved to the Work library.
%macro clear_work;
%local work_members;
proc sql noprint;
select memname
into :work_members separated by ","
from dictionary.tables
where
libname = "WORK" and
memtype = "DATA";
quit;
data _null_;
work_members = symget("work_members");
num_members = input(symget("sqlobs"), best.);
do n = 1 to num_members;
this_member = scan(work_members, n, ",");
call symput("member"||trim(left(put(n,best.))),trim(this_member));
end;
call symput("num_members", trim(left(put(num_members,best.))));
run;
%if &num_members gt 0 %then %do;
proc datasets library = work nolist;
%do n=1 %to &num_members;
delete &&member&n
%end;
quit;
%end;
%mend clear_work;
%clear_work
Note: The previous macro deletes
all data sets in the Work library.
The transformation output
tables for a process flow remain until the SAS session that is associated
with the flow is terminated. Analyze the process flow and determine
whether there are output tables that are not being used (especially
if these tables are large). If so, you can add transformations to
the flow that deletes these output tables and free up valuable disk
space and memory. For example, you can add a generated transformation
that deletes output tables at a certain point in the flow. For details
about generated transformations, see
Creating and Using a Generated Transformation.