Problem Note 63127: PROC DS2 MERGE and SET statements with a BY statement do not run inside Apache Hadoop
The DS2 procedure can run thread programs inside Hadoop when one of the following settings is used:
- The DS2ACCEL system option is set to ANY.
- The DS2ACCEL system option is set to YES in the PROC DS2 statement.
However, Apache Hive 0.13 and later no longer allows any periods (".") in quoted column names, which disallows the use of a period by PROC DS2 as a delimiter within FIRST.variable and LAST.variable flags. This new Hive restriction affects any MERGE and multitable SET statements with a BY statement that are executed in Hadoop. Single-table SET statements are not affected.
When this issue occurs, the SAS® log can display a combination of messages that are similar to the following:
ERROR: Compilation error.
NOTE: Created thread thread-name in data set data-set-name.
ERROR: Line 40: Unable to finalize merge statement (rc=0x80fff802U).
ERROR: Line 62: Thread thread-name is not defined.
ERROR: Line 62: The thread type thread-type, used in declaring t, is not defined.
NOTE: Running THREAD program in-database
NOTE: Running DATA program in-database
ERROR: General error org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: SemanticException
[Error 10036]: Duplicate column name: make
ERROR: Failed to run DS2INDB.
ERROR: Error returned from tkedsPubINDBDS2.
This issue applies to the following Hadoop distributions:
- MapR Converged Data Platform 5.2 or 6.0
- Hortonworks Data Platform 3.0
- Cloudera Data Platform 5.2 and newer
It can occur with Amazon Web Services (when a Hive version of 0.13 or later is selected).
Click the Hot Fix tab in this note to access the hot fix for this issue.
Operating System and Release Information
SAS System | Base SAS | z/OS | 9.4 TS1M4 | 9.4 TS1M6 |
z/OS 64-bit | 9.4 TS1M4 | 9.4 TS1M6 |
Microsoft® Windows® for x64 | 9.4 TS1M4 | 9.4 TS1M6 |
Microsoft Windows 8 Enterprise 32-bit | 9.4 TS1M4 | 9.4 TS1M6 |
Microsoft Windows 8 Enterprise x64 | 9.4 TS1M4 | 9.4 TS1M6 |
Microsoft Windows 8 Pro 32-bit | 9.4 TS1M4 | 9.4 TS1M6 |
Microsoft Windows 8 Pro x64 | 9.4 TS1M4 | 9.4 TS1M6 |
Microsoft Windows 8.1 Enterprise 32-bit | 9.4 TS1M4 | 9.4 TS1M6 |
Microsoft Windows 8.1 Enterprise x64 | 9.4 TS1M4 | 9.4 TS1M6 |
Microsoft Windows 8.1 Pro 32-bit | 9.4 TS1M4 | 9.4 TS1M6 |
Microsoft Windows 8.1 Pro x64 | 9.4 TS1M4 | 9.4 TS1M6 |
Microsoft Windows 10 | 9.4 TS1M4 | 9.4 TS1M6 |
Microsoft Windows Server 2008 | 9.4 TS1M4 | |
Microsoft Windows Server 2008 R2 | 9.4 TS1M4 | |
Microsoft Windows Server 2008 for x64 | 9.4 TS1M4 | |
Microsoft Windows Server 2012 Datacenter | 9.4 TS1M4 | 9.4 TS1M6 |
Microsoft Windows Server 2012 R2 Datacenter | 9.4 TS1M4 | 9.4 TS1M6 |
Microsoft Windows Server 2012 R2 Std | 9.4 TS1M4 | 9.4 TS1M6 |
Microsoft Windows Server 2012 Std | 9.4 TS1M4 | 9.4 TS1M6 |
Windows 7 Enterprise 32 bit | 9.4 TS1M4 | 9.4 TS1M6 |
Windows 7 Enterprise x64 | 9.4 TS1M4 | 9.4 TS1M6 |
Windows 7 Home Premium 32 bit | 9.4 TS1M4 | 9.4 TS1M6 |
Windows 7 Home Premium x64 | 9.4 TS1M4 | 9.4 TS1M6 |
Windows 7 Professional 32 bit | 9.4 TS1M4 | 9.4 TS1M6 |
Windows 7 Professional x64 | 9.4 TS1M4 | 9.4 TS1M6 |
Windows 7 Ultimate 32 bit | 9.4 TS1M4 | 9.4 TS1M6 |
Windows 7 Ultimate x64 | 9.4 TS1M4 | 9.4 TS1M6 |
64-bit Enabled AIX | 9.4 TS1M4 | 9.4 TS1M6 |
64-bit Enabled Solaris | 9.4 TS1M4 | 9.4 TS1M6 |
HP-UX IPF | 9.4 TS1M4 | 9.4 TS1M6 |
Linux for x64 | 9.4 TS1M4 | 9.4 TS1M6 |
Solaris for x64 | 9.4 TS1M4 | 9.4 TS1M6 |
*
For software releases that are not yet generally available, the Fixed
Release is the software release in which the problem is planned to be
fixed.
This code demonstrates how to specify the DS2ACCEL option so that the thread program runs inside of Hadoop. The DS2 program demonstrates a thread program with a MERGE statement along with a BY statement.
/* You can specify the DS2ACCEL option on an OPTIONS statement */
options ds2accel=any;
/* specify the appropriate path and options for your Hadoop connection */
libname myhdp hadoop server='server123.unx.yours.com';
/* You can specify the DS2ACCEL option on the PROC statement */
proc ds2 ds2accel=yes;
thread t_pgm /overwrite=yes;
method run();
merge myhdp.a myhdp.b;
by x y;
end;
endthread;
run;
data nyhdp.new/overwrite=yes;
dcl thread t_pgm t;
method run();
set from t;
end;
enddata;
run;
quit;
Type: | Problem Note |
Priority: | high |
Topic: | SAS Reference ==> Procedures ==> DS2
|
Date Modified: | 2019-05-24 15:04:39 |
Date Created: | 2018-10-25 13:13:00 |