Considerations and Limitations
-
The SAS In-Database Code Accelerator
is available only for Greenplum, Hadoop, and Teradata.
-
When you use the SAS In-Database
Code Accelerator for Greenplum, only the thread program runs inside
the database.
-
When you use the SAS In-Database
Code Accelerator for Hadoop and Teradata, both the data and thread
program run inside the database if the output table from the data
program resides in Hadoop or Teradata. You can use a different LIBNAME
statement for the input and output table if the input and output librefs
meet the following conditions:
-
The librefs are on the same Hadoop
cluster or in the same Teradata database.
-
For Hadoop, both files must be
accessible by Hive, or both files must be accessible in HDFS by means
of an HDMD file.
-
When the connection strings are
compared, they must be identical in value and case except for these
values:
If the output table
from the data program does not reside in Hadoop or Teradata, only
the thread program is run inside the database.
-
If the thread program is run inside
the database, the number of threads is set by the SAS In-Database
Code Accelerator. When this occurs, the THREADS= argument in the SET
FROM statement in the data program has no effect.
-
When a matrix is declared in a
thread program, each thread program has its own, individual instance
of a matrix. The DS2 matrix package does not support data partitioning
between nodes or threads to perform parallel matrix operations. Instead,
each thread performs the matrix operations on its own instance of
the matrix.
-
The DS2 program fails if you try
to use an empty format that you defined with PROC FORMAT.
-
Only one SET statement is allowed
when using the SAS In-Database Code Accelerator. If more than one
SET statement is used in the thread program, the thread program is
not run inside the database. Instead, the thread program runs on the
client.
-
Thread and data programs that use
packages are supported. However, using a HASH, HTTP, or SQLSTMT package
causes the thread program to run on the client and not inside the
database.
-
In-database processing does not
occur when the following methods are used to load data. Instead, the
data and thread programs are run on the client.
-
Using a SET statement with embedded
SQL code
-
-
Using an initialized hash package
-
-
Only one input table is allowed
in the SET statement. If more than one input table is used in the
SET statement, the thread program is not run inside the database.
Instead, the thread program runs on the client.
-
Using an unrecognized catalog in
the SET statement causes the thread program to run on the client.
-
If you use a HAVING
clause to format output column data, the format is not applied to
the output column data when the data is written back to a file. The
format is specified in the output column metadata. The SAS/ACCESS
Engine for Hadoop is currently unable to understand column format.
Therefore, PROC PRINT or PROC CONTENTS do not print or display the
contents with the format specified in the column's metadata.
-
A Hive STRING data type is always
converted to a VARCHAR data type using the following rules:
-
-
STRING + SASFMT:CHAR(n)
-> VARCHAR(n)
-
STRING + SASFMT:VARCHAR(n)
-> VARCHAR(n)
-
STRING + DBMAX_TEXT -> VARCHAR(DBMAX_TEXT)
-
When using SAS Code Accelerator
for Hadoop, the Hive user needs Read and Write access to the TempDir
and the Destination Warehouse directories. In addition, the MapReduce
user needs Read and Write permission.
Copyright © SAS Institute Inc. All rights reserved.