SAS Data Loader
enables you to easily access, transform, and manage data stored in
Hadoop. You can use the Hadoop Configuration panel
of the Configuration window to specify a
connection to a Hadoop cluster. You can also use it to change the
default storage locations for various types of files.
The values for Host, Port, User
ID, Password, and Oozie
URL are entered in the vApp during the initial setup
of SAS Data Loader,
as described in the SAS Data Loader for Hadoop: vApp Deployment Guide. Typically,
you will not change these values except in consultation with your
Hadoop administrator. The default values in the storage location fields
will work in many cases, but you can change one or more of these locations
for your site.
Note: To reconfigure SAS Data Loader
for a different Hadoop cluster, you must copy a new set of configuration
files and JAR files into the shared folder for the vApp. Then you
can update these configuration settings for the new cluster. For
more information about configuring a new version of Hadoop, see SAS Data Loader for Hadoop: vApp Deployment Guide.
The fields in the
Hadoop
Configuration panel are as follows:
Host
the fully qualified
host name for HiveServer2 on your Hadoop cluster.
Port
the port number for
HiveServer2 on your Hadoop cluster.
User ID
the name of the user
account that is used to connect to the Hadoop cluster. If this field
is editable, you can specify an ID that is provided by your Hadoop
administrator.
The User
ID is not editable if Kerberos security has been specified
in the vApp, as described in SAS Data Loader for Hadoop: vApp Deployment Guide.
When your cluster uses
a MapR distribution of Hadoop, the User ID field
is populated from a configuration file when you start the vApp. To
change the User ID field, first enter the
new value in the file vApp-home\shared-folder\hadoop\conf\mapr-users.json
.
Next, restart the vApp to read the new value. Finally, open the Hadoop
Configuration panel and enter the new user ID.
Password
the password for the
user account that is used to connect to the Hadoop cluster. If your
system administrator indicates that the cluster uses LDAP authentication,
a password is required. Enter the password that is provided by the
administrator. If the Hadoop cluster does not require a password for
authentication, leave this field blank.
The Password is
not editable if Kerberos security has been specified in the vApp.
Oozie URL
the URL to the Oozie
Web Console, which is an interface to the Oozie server. Oozie is a
workflow scheduler system that is used to manage Hadoop jobs. SAS
Data Loader uses the SQOOP and Oozie components installed with the
Hadoop cluster to move data to and from a DBMS.
-
URL format: http://host_name:port_number/oozie/
-
URL example (using default port
number): http://my.example.com:11000/oozie/
Schema for temporary file storage
enables you to specify
an alternative schema in Hive for the temporary files that are generated
by some directives. The default schema for these files is default.
Any alternative schema must exist in Hive.
Hive storage location
enables you to specify
a location on the Hadoop file system to store your content that is
not the default storage location. You must have appropriate permissions
to this location in order to use it.
For more information,
see Overriding the Hive Storage Location for Target Tables.
SAS HDFS temporary storage location
enables you to specify
an alternative location on the Hadoop file system to read and write
temporary files when using features specific to SAS. You must have
appropriate permissions to this location in order to use it.
If the default temporary
storage directory for SAS Data Loader is not appropriate for some
reason, you can change that directory. For example, some SAS Data
Loader directives might fail to run if they cannot write to the temporary
directory. If that happens, ask your Hadoop administrator if the sticky
bit has been set on the default temporary directory (typically /tmp
).
If that is the case, specify an alternate location in the SAS
HDFS temporary storage location field. Your Hadoop administrator
will tell you the location of this alternate directory. The administrator
must grant you Read and Write access to this directory.