Bulk loading to the
Impala server can be accomplished in two ways: you can use the WebHDFS
interface to Hadoop to push data to HDFS, or you can configure a required
set of Hadoop JAR files. Both approaches require Hadoop configuration
files needed by SAS to be in one location and available to the client
machine. To use WebHDFS, you must additionally set the SAS_HADOOP_RESTFUL=
environment variable to 1. To use Java, you must make the Hadoop JAR
location known to the client machine and ensure that the SAS_HADOOP_RESTFUL=
environment variable is not set to 1 (or TRUE or YES).
Specifying BULKLOAD=YES
causes two CREATE TABLE statements to be issued to the Impala server.
One creates the target Impala table. The other creates a temporary
table. SAS uses WebHDFS to upload table data to the HDFS /tmp directory.
The resulting file is a UTF-8 delimited text file. SAS issues a LOAD
DATA statement to move the data file from the /tmp directory to the
temporary table, then issues an INSERT INTO statement that copies
and transforms the text data from the temporary table to the target
table. The temporary table is then deleted from HDFS.