Using SAS Data in HDFS Libraries

Default Library

When your deployment uses Hadoop as a co-located data provider, the SAS Deployment Wizard registers a predefined library for it. This library is available for use in the SAS Folders tree, and it is located in /Shared Data/SAS Visual Analytics/Public/Visual Analytics Public HDFS.

Staging Library

You can specify a SAS Data in HDFS library as a staging library. This is a common use because the rows for the output table are distributed among the machines in the cluster. The server can then read the data in parallel when it loads the table to memory.
You must specify a SAS LASR Analytic Server library for the output library when you use a SAS Data in HDFS library for staging.

Output Library

You can specify a SAS Data in HDFS library as an output library. The engine distributes the rows for the table to the machines in the cluster. Afterward, you can select the table from the SAS Folders tree, right-click, and select Load a Table. This menu option loads the table from HDFS to memory on a SAS LASR Analytic Server.
You can partition SAS Data in HDFS tables when they are used in an output library. You can select a column to use from the Partition by menu. Partitioning the table ensures that all of the rows with the same formatted value as the selected column are distributed to one machine in the cluster. The rows are also placed in the same HDFS block. When you load a partitioned table to memory, the partitioning information is retained, and the result is a partitioned in-memory table.

See Also

Restrictions

The following restrictions apply to using SAS Data in HDFS libraries with SAS Visual Data Builder:
  • You cannot specify a SAS Data in HDFS library as an input library because the SAS Data in HDFS engine is a Write-only engine.
  • The Append data check box on the Properties tab is disabled. The SAS Data in HDFS engine does not support appending data.
  • If you specify a SAS Data in HDFS library as an output library, you cannot view the results on the Results tab because the SAS Data in HDFS engine is a Write-only engine.