Use the Hadoop
Configuration panel of the Configuration window
to specify credentials, Hive and Impala server connections, and preferences
for the SQL environment and the run-time target.
A Reload button
enables you to load a predetermined Hadoop configuration from a configuration
file.
The values for Hive
server, Impala server, and Oozie
URL are often populated when SAS Data Loader is first
initialized. Review these settings and contact your Hadoop administrator
as needed.
Specify the appropriate User
ID. If you are using LDAP authentication, enter a Password.
To reconfigure SAS Data Loader
for a different Hadoop cluster, you must copy a new set of configuration
files and JAR files into the shared folder of the vApp. For more information
about configuring a new version of Hadoop, see SAS Data Loader for Hadoop: vApp Deployment Guide.
The fields and controls
in the
Hadoop Configuration panel are defined
as follows:
User ID
The name of the user
account that is used to connect to the Hadoop cluster. If this field
can be edited, specify the name that is provided by your Hadoop administrator.
CAUTION:
Enter a
user ID only when using LDAP authentication.
Entering a user ID
in any other environment disables the use of the Cloudera Impala SQL
environment. If you are not using LDAP authentication, and a User
ID value is displayed, click the trash can icon to remove
that value.
When your cluster uses
a MapR distribution of Hadoop without Kerberos authentication, the User
ID field is populated from a configuration file when
you start the vApp. To change the User ID field,
first enter the new value in the file vApp-home\shared-folder\hadoop\conf\mapr-user.json
.
Next, restart the vApp to read the new value. Finally, open the Hadoop
Configuration panel and enter the new user ID.
Password
The password for the
user account that is used to connect to the Hadoop cluster. If your
system administrator indicates that the cluster uses LDAP authentication,
a password is required. Enter the password that is provided by the
administrator.
CAUTION:
Enter a
password only when using LDAP authentication.
Entering a password
in any other environment disables the use of the Cloudera Impala SQL
environment.
The Password is
not editable if Kerberos security has been specified in the vApp.
Host (Hive server)
The fully qualified
host name for the Hive server on your Hadoop cluster. A continuously
operational Hive server connection is required by SAS Data Loader
for Hadoop. This value is always required.
Port (Hive server)
The port number on
the Hive server that receives client connection requests. This value
is always required.
Test Connection (Hive server)
Click this button to
validate your Host and Port values,
and to verify that the Hive server is operational.
Host (Impala server)
The fully qualified
host name for the Cloudera Impala server on your Hadoop cluster. This
value is required when the value of SQL environment is Impala.
This value is optional when the value of SQL environment is Hive.
Port (Impala server)
The number of the port
on the Cloudera Impala server that receives client connection requests.
This value is required when the value of SQL environment is Impala.
This value is optional when the value of SQL environment is Hive.
Test Connection (Impala server)
Click this button to
validate your Host and Port values,
and to verify that the Impala server is operational.
SQL environment
Choose the
Impala value
to specify Cloudera Impala as the default environment for new directives,
and to enable job execution in that environment. This value applies
only to the set of directives that support Impala, as listed
in Enable Support for Impala and Spark.
Directives that do
not support Impala continue to run in the HiveQL environment as usual.
Individual instances
of the supporting directives can be configured to override the default
value.
Specify the Hive value
in the SQL environment field to establish
Hive as the default SQL environment for new directives.
Note: Changing this value does
not change the SQL environment of saved directives.
Preferred runtime target
Select the value Hadoop
Spark to enable new instances of the supporting directives
to run with Apache Spark by default. Apache Spark must be installed
and fully configured on the Hadoop cluster. If Apache Spark was detected
on the Hadoop cluster during the installation of the SAS In-Database
Technologies for Hadoop, then the Hadoop Spark value
will be set by default.
Select the value MapReduce to
enable new directives to run with the MapReduce run-time target by
default.
Individual instances
of the supporting directives can be configured to override this default
value.
Oozie URL
Specify the HTTP address
of the Oozie web console, which is an interface to the Oozie server.
Oozie is a workflow
scheduler in Hadoop that manages the execution of jobs. SAS Data
Loader uses Oozie to copy data to and from databases such as Oracle
and Teradata, and to execute directives in the Spark run-time environment.
-
URL format: http://host_name:port_number/oozie/
-
URL example (using default port
number): http://my.example.com:11000/oozie/