Collect Files

About Collecting Files

Certain files must be collected from the Hadoop cluster and made available to the vApp user.
Complete configuration of SAS/ACCESS Interface to Hadoop, as described in SAS Hadoop Configuration Guide for Base SAS and SAS/ACCESS. This process collects necessary files in the appropriate folders on the Hadoop cluster:
installation_path/conf
installation_path/lib
The conf folder contains the required XML and JSON files for the vApp client. The lib folder contains the required JAR files.

Configure the Files

Copying

The conf and lib folders must be copied to a directory on a Windows server to which all vApp users have READ access.

Edit inventory.json

If the Oozie, Spark, or Impala services are running on the Hadoop Cluster, you must edit the appropriate section of the conf/inventory.json file on the Windows server to reflect this. For any service that is available, the “available” parameter must be set to “true.” In addition, the Impala service must specify a host and port, and the Oozie service must specify a URL.
The following example specifies all three services as available:
"impala":{
  "available":"true",
  "port": "21050",
  "hosts":["machine1.domain.com","machine2.domain.com"]
},
"spark":{
  "available":"true"
},
"oozie":{
  "available":"true",
  "url":"http://machine1.domain.com:11000/oozie"
},

For MapR Users

For MapR deployments only, you must manually create a file named mapr-user.json (case-sensitive) that specifies user information that is required by the SAS Data Loader for Hadoop vApp in order for the vApp to interact with the Hadoop cluster. You must supply a user name, user ID, and group ID in this file. The user name must be a valid user on the MapR cluster.
Note: You must add this file to the conf directory that was copied to the Windows server.
To configure user IDs, follow these steps:
  1. Create one User ID for each vApp user.
  2. Create UNIX user IDs on all nodes of the cluster and assign them to a group.
  3. Create the mapr-user.json file containing user ID information. You can obtain this information by logging on to a cluster node and running the ID command. You might create a file similar to the following:
    {
    "user_name"       : "myuser",
    "user_id"         : "2133",
    "user_group_id"   : "2133",
    "take_ownership"  : "true"
    }
  4. Copy mapr-user.json to the conf directory on the Windows server from which the vApp users copy the conf and lib directories.
    Note: To log on to the MapR Hadoop cluster with a different valid user ID, you must edit the information in the mapr-user.json file and in the User ID field of the SAS Data Loader for HadoopConfiguration dialog box. See User ID.
  5. Create a user home directory and Hadoop staging directory in MaprFS. The user home directory is /user/myuser. The Hadoop staging directory is controlled by the setting yarn.app.mapreduce.am.staging-dir in mapred-site.xml and defaults to /user/myuser.
  6. Change the permissions and owner of /user/myuser to match the UNIX user.
    Note: The user ID must have at least the following permissions:
    • Read, Write, and Delete permission for files in the MaprFS directory (used for Oozie jobs)
    • Read, Write, and Delete permission for tables in Hive
  7. SAS Data Loader for Hadoop uses HiveServer2 as its source of tabular data. Ensure that the UNIX user has appropriate permissions on maprFS for the locations of the Hive tables on which the user is permitted to operate.

Inform the vApp Users

Inform the vApp users that they can copy the conf and lib folders from the Windows server to the shared folder SASWorkspace\hadoop on all active instances of the vApp client. These folders are required for the vApp to connect to Hadoop successfully.