Data Set Options for Relational Databases |
Default value: | NO |
Valid in: | DATA and PROC steps (when creating and appending to DBMS tables using SAS/ACCESS software) |
DBMS support: | Teradata |
Syntax | |
Syntax Description | |
Details | |
Bulk Loading | |
Data Buffers | |
Temporary Tables | |
Restarting MultiLoad | |
Examples | |
See Also |
Syntax |
MULTILOAD=YES | NO |
uses the Teradata MultiLoad utility, if available, to load Teradata tables.
sends inserts to Teradata tables one row at a time.
Details |
The SAS/ACCESS MultiLoad facility provides a bulk-loading method of loading both empty and existing Teradata tables. Unlike FastLoad, MultiLoad can append data to existing tables.
To determine whether threaded reads are actually generated, turn on SAS tracing by setting OPTIONS SASTRACE=",,,d" in your program.
Two data set options are available for tuning the number and the size of data buffers that are used for transferring data from SAS to Teradata. Data is transferred from SAS to Teradata using shared memory. The default shared memory buffer size is 64K. The default number of shared memory buffers used for the transfer is 2. You can use BUFFERS= to vary the number of buffers for data transfer from 1 to 8. You can use MBUFSIZE= to vary the size of the shared memory buffers from the size of each data row up to 1MB.
The Teradata MultiLoad utility uses four different temporary tables when it performs the bulk-load operation. It uses a log table to track restart information, two error tables to track errors, and a work table to hold data before the insert operation is made.
By default, the SAS/ACCESS MultiLoad facility generates names for these temporary tables, where randnum represents a random number. To specify a different name for these tables, use ML_RESTART=, ML_ERROR1=, ML_ERROR2=, and ML_WORK=, respectively.
Temporary Table | Table Name |
---|---|
Restart table | SAS_ML_RS_randnum |
Acquisition error table | SAS_ML_ET_randnum |
Application error table | SAS_ML_UT_randnum |
Work table | SAS_ML_WT_randnum |
You can use ML_LOG= to specify a prefix for the temporary table names that MultiLoad uses.
Here is the order that is used for naming the temporary tables that MultiLoad uses:
If you set ML_LOG=, the prefix that you specified is used when naming temporary tables for MultiLoad.
If you do not specify ML_LOG=, the values that you specified for ML_ERROR1, ML_ERROR2, ML_WORK, ML_RESTART are used.
If you do not specify any table naming options, temporary table names are generated by default.
Note: You cannot use ML_LOG with any of these options: ML_ERROR1, ML_ERROR2, ML_WORK, and ML_RESTART.
The MultiLoad bulk-load operation (or MultiLoad job) works in phases. The first is the acquisition phase, during which data is transferred from SAS to Teradata work tables. The second is the application phase, during which data is applied to the target table.
If the MultiLoad job fails during the acquisition phase, you can restart the job from the last successful checkpoint. The exact observation from which the MultiLoad job must be restarted displays in the SAS log. If the MultiLoad job fails in the application phase--when data is loaded onto the target tables from the work table--restart the MultiLoad job outside of SAS. The MultiLoad restart script displays in the SAS log. You can run the generated MultiLoad script outside of SAS to complete the load.
You can use ML_CHECKPOINT= to specify the checkpoint rate. Specify a value for ML_CHECKPOINT= if you want restart capability. If checkpoint tracking is not used and the MultiLoad fails in the acquisition phase, the load needs to be restarted from the beginning. ML_CHECKPOINT=0 is the default, and no checkpoints are recoded if you use the default.
If ML_CHECKPOINT is between 1 and 59 inclusive, checkpoints are recorded at the specified interval in minutes. If ML_CHECKPOINT is greater than or equal to 60, then a checkpoint occurs after a multiple of the specified rows are loaded.
ML_CHECKPOINT= functions very much like the Teradata MultiLoad utility checkpoint, but it differs from the DBCOMMIT= data set option.
These restrictions apply when you restart a failed MultiLoad job.
The failed MultiLoad job must have specified a checkpoint rate other than 0 using the ML_CHECKPOINT= data set option. Otherwise, restarting begins from the first record of the source data.
Checkpoints are relevant only to the acquisition phase of MultiLoad. Even if ML_CHECKPOINT=0 is specified, a checkpoint takes place at the end of the acquisition phase. If the job fails after that (in the application phase) you must restart the job outside of SAS using the MultiLoad script written to the SAS log.
For example, this MultiLoad job takes a checkpoint every 1000 records.
libname trlib teradata user=testuser pw=XXXXXX server=dbc; /* Create data to MultiLoad */ data work.testdata; do x=1 to 50000; output; end; end; data trlib.mlfloat(MultiLoad=yes ML_CHECKPOINT=1000); set work.testdata; run;
You must restart the failed MultiLoad job as an append process because the target table already exists. It is also necessary to identify the work tables, restart table, and the error tables used in the original job.
For example, suppose that the DATA step shown above failed with this error message in the SAS log:
ERROR: MultiLoad failed with DBS error 2644 after a checkpoint was taken for 13000 records. Correct error and restart as an append process with data set options ML_RESTART=SAS_ML_RS_1436199780, ML_ERROR1=SAS_ML_ET_1436199780, ML_ERROR2=SAS_ML_UT_1436199780, and ML_WORK=SAS_ML_WT_1436199780. If the first run used FIRSTOBS=n, then use the value (7278+n-1) for FIRSTOBS in the restart. Otherwise use FIRSTOBS=7278. Sometimes the FIRSTOBS value used on the restart can be an earlier position than the last checkpoint because restart is block-oriented and not record-oriented.
After you fix the error, you must restart the job as an append process and you must specify the same work, error, and restart tables as you used in the earlier run. You use a FIRSTOBS= value on the source table to specify the record from which to restart.
/* Restart a MultiLoad job that failed in the acquisition phase after correcting the error */ proc append data=work.testdata(FIRSTOBS=7278) base=trmlib.mlfloat(MultiLoad=YES ML_RESTART=SAS_ML_RS_1436199780 ML_ERROR1=SAS_ML_ET_1436199780 ML_ERROR2=SAS_ML_UT_1436199780 ML_WORK=SAS_ML_WT_1436199780 ML_CHECKPOINT=1000); run;
If you used ML_LOG= in the run that failed, you can specify the same value for ML_LOG= on restart. Therefore, you need not specify four data set options to identify the temporary tables that MultiLoad uses.
For example, assume that this is how the original run used ML_LOG=:
data trlib.mlfloat(MultiLoad=yes ML_CHECKPOINT=1000 ML_LOG=MY_ERRORS); set work.testdata; run;
If this DATA step fails with this error, the restart capability needs only ML_LOG= to identify all necessary tables.
ERROR: MultiLoad failed with DBS error 2644 after a checkpoint was taken for 13000 records. Correct error and restart as an append process with data set options ML_RESTART=SAS_ML_RS_1436199780, ML_ERROR1=SAS_ML_ET_1436199780, ML_ERROR2=SAS_ML_UT_1436199780, and ML_WORK=SAS_ML_WT_1436199780. If the first run used FIRSTOBS=n, then use the value (7278+n-1) for FIRSTOBS in the restart. Otherwise use FIRSTOBS=7278. Sometimes the FIRSTOBS value used on the restart can be an earlier position than the last checkpoint because restart is block-oriented and not record-oriented.
proc append data=work.testdata(FIRSTOBS=7278) base=trlib.mlfloat(MultiLoad=YES ML_LOG=MY_ERRORS ML_CHECKPOINT=1000); run;
If the MultiLoad process fails in the application phase, SAS has already transferred all data to be loaded to Teradata. You must restart a MultiLoad job outside of SAS using the script that is written to the SAS log. See your Teradata documentation on the MultiLoad utility for instructions on how to run MultiLoad scripts. Here is an example of a script that is written in the SAS log.
=-=-= MultiLoad restart script starts here =-=-= .LOGTABLE MY_ERRORS_RS; .LOGON boom/mloaduser,********; .begin import mload tables "mlfloat" CHECKPOINT 0 WORKTABLES MY_ERRORS_WT ERRORTABLES MY_ERRORS_ET MY_ERRORS_UT /*TIFY HIGH EXIT SASMLNE.DLL TEXT '2180*/; .layout saslayout indicators; .FIELD "x" * FLOAT; .DML Label SASDML; insert into "mlfloat".*; .IMPORT INFILE DUMMY /*SMOD SASMLAM.DLL '2180 2180 2180 */ FORMAT UNFORMAT LAYOUT SASLAYOUT APPLY SASDML; .END MLOAD; .LOGOFF; =-=-= MultiLoad restart script ends here =-=-= ERROR: MultiLoad failed with DBS error 2644 in the application phase. Run the MultiLoad restart script listed above outside of SAS to restart the job.
If the original run used a value for FIRSTOBS= for the source data, use the formula from the SAS log error message to calculate the value for FIRSTOBS= upon restart. These examples show how to do this.
/* Create data to MultiLoad */ data work.testdata; do x=1 to 50000; output; end; run; libname trlib teradata user=testuser pw=testpass server=boom; /* Load 40,000 rows to the Teradata table */ data trlib.mlfloat(MultiLoad=yes ML_CHECKPOINT=1000 ML_LOG=MY_ERRORS); set work.testdata(FIRSTOBS=10001); run;Assume that the DATA step shown above failed with this error message:
ERROR: MultiLoad failed with DBS error 2644 after a checkpoint was taken for 13000 records. Correct the error and restart the load as an append process with data set option ML_LOG=MY_ERRORS. If the first run used FIRSTOBS=n, then use the value (7278+n-1) for FIRSTOBS in the restart. Otherwise use FIRSTOBS=7278. Sometimes the FIRSTOBS value specified on the restart can be an earlier position than the last checkpoint because MultiLoad restart is block-oriented and not record-oriented.
The FIRSTOBS for the restart step can be calculated using the formula provided--that is, FIRSTOBS=7278+100001-1=17278. Use FIRSTOBS=17278 on the source data.
proc append data=work.testdata(FIRSTOBS=17278) base=trlib.mlfloat(MultiLoad=YES ML_LOG=MY_ERRORS ML_CHECKPOINT=1000); run;
Please keep these considerations in mind.
DBCOMMIT= is disabled for MultiLoad in order to prevent any conflict with ML_CHECKPOINT=.
ERRLIMIT= is not available for MultiLoad because the number of errors are known only at the end of each load phase.
For restart to work correctly, the data source must return data in the same order. If the order of data that is read varies from one run to another and the load job fails in the application phase, delete temporary tables and restart the load as a new process. If the job fails in the application phase, restart the job outside of SAS as usual since the data needed to complete the load has already been transferred.
The restart capability in MultiLoad is block-oriented, not record-oriented. For example, if a checkpoint was taken at 5000 records, you might need to restart from an earlier record, such as record 4000, because the block of data containing record 5001 might have started at record 4000. The exact record where restart should occur displays in the SAS log.
Examples |
This example uses MultiLoad to load SAS data to an alternate database. Note that it specifies database=mloaduser in the LIBNAME statement.
libname trlib teradata user=testuser pw=testpass server=dbc database=mloaduser; /*MultiLoad 20000 observations into alternate database mloaduser */ data trlib.trmload14(DBCREATE_TABLE_OPTS="PRIMARY INDEX(IDNUM)" MultiLoad=yes ML_LOG=TRMLOAD14 ML_CHECKPOINT=5000); set permdata.BIG1MIL(drop=year obs=20000); run;
This example extracts data from one table using FastExport and loads data into another table using MultiLoad.
libname trlib teradata user=testuser pw=testpass server=dbc; /* Create data to load */ data trlib.trodd(DBCREATE_TABLE_OPTS="PRIMARY INDEX(IDNUM)" MultiLoad=yes); set permdata.BIG1MIL(drop=year obs=10000); where mod(IDNUM,2)=1; run; /* FastExport from one table and MultiLoad into another */ proc append data=trlib.treven(dbsliceparm=all) base=trlib.trall(MultiLOAD=YES); run;
See Also |
For information about specifying how long to wait before retrying a logon operation, see the SLEEP= Data Set Option.
For information about specifying how many hours to continue to retry a logon operation, see the TENACITY= Data Set Option
For information about specifying a prefix for the temporary table names that MultiLoad uses, see the ML_LOG= Data Set Option.
Maximizing Teradata Load Performance
ML_CHECKPOINT= Data Set Option
Using MultiLoad
Copyright © 2010 by SAS Institute Inc., Cary, NC, USA. All rights reserved.