Considerations and Limitations

  • The SAS In-Database Code Accelerator is available only for Greenplum, Hadoop, and Teradata.
  • When you use the SAS In-Database Code Accelerator for Greenplum, only the thread program runs inside the database.
  • When you use the SAS In-Database Code Accelerator for Hadoop and Teradata, both the data and thread program run inside the database if the output table from the data program resides in Hadoop or Teradata. You can use a different LIBNAME statement for the input and output table if the input and output librefs meet the following conditions:
    • The librefs are on the same Hadoop cluster or in the same Teradata database.
    • For Hadoop, both files must be accessible by Hive, or both files must be accessible in HDFS by means of an HDMD file.
    • When the connection strings are compared, they must be identical in value and case except for these values:
      • CATALOG (Teradata)
      • SCHEMA
      • HDFS_METADIR (Hadoop)
      • HDFS_TEMPDIR (Hadoop)
      • HDFS_PERMDIR (Hadoop)
    If the output table from the data program does not reside in Hadoop or Teradata, only the thread program is run inside the database.
  • If the thread program is run inside the database, the number of threads is set by the SAS In-Database Code Accelerator. When this occurs, the THREADS= argument in the SET FROM statement in the data program has no effect.
  • When a matrix is declared in a thread program, each thread program has its own, individual instance of a matrix. The DS2 matrix package does not support data partitioning between nodes or threads to perform parallel matrix operations. Instead, each thread performs the matrix operations on its own instance of the matrix.
  • The DS2 program fails if you try to use an empty format that you defined with PROC FORMAT.
  • Only one SET statement is allowed when using the SAS In-Database Code Accelerator. If more than one SET statement is used in the thread program, the thread program is not run inside the database. Instead, the thread program runs on the client.
  • Thread and data programs that use packages are supported. However, using a HASH, HTTP, or SQLSTMT package causes the thread program to run on the client and not inside the database.
  • In-database processing does not occur when the following methods are used to load data. Instead, the data and thread programs are run on the client.
    • Using a SET statement with embedded SQL code
    • Using an SQLSTMT package
    • Using an initialized hash package
    • Using an HTTP package
  • Only one input table is allowed in the SET statement. If more than one input table is used in the SET statement, the thread program is not run inside the database. Instead, the thread program runs on the client.
  • Using an unrecognized catalog in the SET statement causes the thread program to run on the client.
  • If you use a HAVING clause to format output column data, the format is not applied to the output column data when the data is written back to a file. The format is specified in the output column metadata. The SAS/ACCESS Engine for Hadoop is currently unable to understand column format. Therefore, PROC PRINT or PROC CONTENTS do not print or display the contents with the format specified in the column's metadata.
  • A Hive STRING data type is always converted to a VARCHAR data type using the following rules:
    • STRING -> VARCHAR(65355)
    • STRING + SASFMT:CHAR(n) -> VARCHAR(n)
    • STRING + SASFMT:VARCHAR(n) -> VARCHAR(n)
    • STRING + DBMAX_TEXT -> VARCHAR(DBMAX_TEXT)
  • When using SAS Code Accelerator for Hadoop, the Hive user needs Read and Write access to the TempDir and the Destination Warehouse directories. In addition, the MapReduce user needs Read and Write permission.