Using SAS Data in HDFS Libraries

Default Library

When your deployment includes SAS High-Performance Deployment of Hadoop, the SAS Deployment Wizard registers a library for it. This library is available for use in the SAS Folders tree and is located in /Products/SAS Visual Analytics High-Performance Configuration/Visual Analytics HDFS.

Staging Library

You can specify a SAS Data in HDFS library as a staging library. This is a common use because the rows for the output table are distributed among the machines in the cluster. A SAS LASR Analytic Server instance can then read the data in parallel when it loads the table to memory.
You must specify a SAS LASR Analytic Server library for the output library when you use a SAS Data in HDFS library for staging.

Output Library

You can specify a SAS Data in HDFS library as an output library. The engine distributes the rows for the table to the machines in the cluster. Afterward, you can select the table from the SAS Folders tree, right-click, and select Load a Table. This action loads the table from HDFS to memory on a SAS LASR Analytic Server instance.
You can also partition SAS Data in HDFS tables when they are used in an output library. You can select a column to use from the Partition by menu. Partitioning the table is used to ensure all the rows with same formatted value of the selected column are distributed to one machine in the cluster. The rows are also placed in the same HDFS block. When you load a partitioned table to memory, the partitioning information is retained and results in a partitioned in-memory table.

Restrictions

The following restrictions apply to using SAS Data in HDFS libraries with SAS Visual Data Builder:
  • You cannot specify a SAS Data in HDFS library as an input library because the SAS Data in HDFS engine is a Write-only engine.
  • The Append data check box on the Query Properties panel is disabled. The SAS Data in HDFS engine does not support appending data.
  • If you specify a SAS Data in HDFS library as an output library, you cannot view the results on the Results view because the engine is a Write-only engine.