LIBNAME Statement Specifics for Hadoop

Overview

This section describes the LIBNAME statement that SAS/ACCESS Interface to Hadoop supports and includes examples. For general information about this feature, see LIBNAME statement .

Here is the LIBNAME statement syntax for accessing Hadoop.

LIBNAME libref hadoop <connection-options> <LIBNAME-options>

Security Limitations

SAS/ACCESS uses the Hadoop Hive Server to access Hadoop data. SAS/ACCESS creates a JDBC connection and passes the user ID and password that you specify to Hive. Through Hive 8, JDBC ignores these credentials, instead associating Hive permissions with the UNIX user ID that started the Hive Service.

SAS/ACCESS creates new Hive tables using the Hadoop Distributed File System (HDFS) Streaming API. The API requires a valid user ID and password. When you create or append to a table, the user ID and password that you provide in the LIBNAME statement is passed to this streaming method. The user ID must be valid on the Hadoop cluster and needs Write access to the Hadoop /tmp and the Hive warehouse directories. Therefore, the Hadoop system administrator should ensure Write access to the HDFS /tmp and Hive warehouse directories for all user IDs that are permitted to create or append to Hive tables with SAS/ACCESS.

To optimize large data reads, SAS/ACCESS can use the API instead of JDBC to read Hive result sets. Similar requirements apply: a valid Hadoop user ID and password, and Write access to Hadoop /tmp.

Future Hive editions will address JDBC security limitations. While USER= and PASSWORD= are not yet fully enabled, use both USER= and PASSWORD= on LIBNAME and CONNECT statements to prepare for future Hive releases.

Arguments

libref

specifies any SAS name that serves as an alias to associate SAS with a database, schema, server, or group of tables and views.

hadoop

specifies the SAS/ACCESS engine name for the Hadoop interface.

connection-options

provide connection information and control how SAS manages the timing and concurrence of the connection to Hadoop. Here is how these options are defined.

USER=<'>user-name<'>

lets you connect to a Hadoop database with a user ID that is different from the default ID. USER= is required for creating tables but is ignored for reading them.

PASSWORD=<'>password<'>

specifies the Hadoop password that is associated with your user ID. If it contains spaces or nonalphanumeric characters, you must enclose it in quotation marks. If you do not want to enter your Hadoop password in uncoded text on this statement, see PROC PWENCODE in the Base SAS Procedures Guide for a method to encode it.

Alias: PASS=, PWD=, PW=

SERVER=<'>server-name<'>

specifies the Hadoop server name that runs the Hive Server. If the server name contains spaces or nonalphanumeric characters, you must enclose it in quotation marks.

Alias: HOST=

PORT=port

specifies the port number that is used to connect to the specified Hive Server.

Default: 10000

SCHEMA=Hive-schema

specifies the Hive schema.

Default: SCHEMA=default

Alias: DATABASE=, DB=

CONFIG=config-file

specifies the Hadoop configuration file for SAS/ACCESS to use. URL components in the configuration file should match the values in the SERVER=, PORT=, and BL_PORT= LIBNAME options. Specify CONFIG= only if you are connecting to a Hadoop system that is not based on HDFS, such as the MapR Hadoop Distribution.

LIBNAME-options

define how SAS processes DBMS objects. The following table describes the LIBNAME options for SAS/ACCESS Interface to Hadoop, with the applicable default values. For details about these options, see LIBNAME Options for Relational Databases.

SAS/ACCESS LIBNAME Options for Hadoop

Option	Default Value
ACCESS=	none
AUTHDOMAIN=	none
BL_DELIMITER=	\001 (Ctrl-A)
BL_PORT=	8020 (required only if Hadoop HDFS streaming is running on a nondefault port)
BULKLOAD=	none
DBCREATE_TABLE_EXTERNAL=	NO
DBCREATE_TABLE_OPTS=	none
DBGEN_NAME=	DBMS
DBSASLABEL=	COMPAT
DIRECT_SQL=	YES
SPOOL=	YES
SQL_FUNCTIONS=	none

Hadoop LIBNAME Statement Examples

This example uses the default Hive port and schema.

libname hdp hadoop server=hxpduped 
   user=hadoop_usr password=hadoop_pwd;

This example explicitly specifies the default Hive port and schema.

libname hdp hadoop server=hxpduped port=10000 schema=default
   user=hadoop_usr password=hadoop_pwd;