About Adding Data to Co-located Storage

Data Output for Co-located Storage

The unique value of SAS Visual Analytics Administrator is that it enables administrators to add data to a data provider that is co-located with the SAS LASR Analytic Server. SAS Visual Analytics Hadoop is a co-located data provider that is available from SAS. In this case, the Hadoop Distributed File System (HDFS) is used for data output. The purpose of adding the data to a co-located data provider is that the server can read data in parallel at very impressive rates from a co-located data provider.
In addition to SAS Visual Analytics Hadoop, Teradata Enterprise Data Warehouse and EMC Greenplum Data Computing Appliance are supported co-located data providers. When a third-party vendor appliance is used, the SAS/ACCESS interface for the database must be licensed and configured.

Output to SAS Visual Analytics Hadoop

In addition to the performance advantage of using co-located data, when SAS Visual Analytics Hadoop is used, it provides data redundancy. By default, two copies of the data are stored in HDFS. If a machine in the cluster becomes unavailable, another machine in the cluster retrieves the data from a redundant block and loads the data into memory.
SAS Visual Analytics Administrator distributes blocks evenly to the machines in the cluster so that all the machines acting as the server have an even workload. The block size is also optimized based on the number of machines in the cluster and the size of the data that is being stored. Before the data is transferred to HDFS, SAS software determines the number of machines in the cluster, row length, and the number of rows in the data. Using this information, SAS software calculates an optimal block size to provide an even distribution of data. However, the block size is bounded by a minimum size of 1 KB and a maximum size of 64 MB.
For very small data sets, the data is not distributed evenly. It is transferred to the root node of the cluster and then inserted into SAS Visual Analytics Hadoop. SAS Visual Analytics Hadoop then distributes the data in blocks according to the default block distribution algorithm.

Registered Tables

SAS Visual Analytics Administrator can load data from registered tables in metadata or SASHDAT files stored in HDFS to the SAS LASR Analytic Server. The following table describes the performance considerations for both methods:
Performance Considerations
Data Source
Advantages
Disadvantages
Registered table (from a library that does not use a co-located data provider)
Provides a rapid method for loading tables.
Appropriate for smaller data sets because the data must be transferred over the network.
If the table is unloaded or the server stops, the data must be transferred over the network again.
Registered table from a library that uses a co-located data provider
Impressive performance for loading very large data sets in parallel.
If a SAS LASR Analytic Server is stopped or the table is unloaded from the server, loading it again is also very fast.
Requires a separate step to add the data to HDFS or the co-located third-party vendor database and then load it before the data is available to SAS clients.

Add to HDFS Option

The Add Table window is launched by selecting the Add to HDFS option for a table within the folder tree in the navigation pane.
Add Table Window
Add Table Window
When the co-located data provider is SAS High-Performance Deployment of Hadoop, the following options are available in the Add Table window:
Fields in the Add to HDFS and Add to a Data Server Dialog Boxes
Field
Description
Source Table
Name
Specifies the filename of the source table in the metadata.
Library
Specifies the library in the metadata with which the source table is associated.
LASR Table
Name
Enter the filename of the target or output table that should be loaded to HDFS.
Description
This field is optional. Enter a description of the target table.
This description is displayed along with the other information for the target table on the LASR Tables tab. It is also displayed in SAS Visual Analytics applications when users launch the Open Data Source window. This window is displayed with a list of data sources that includes this table, and the description provided for this table.
Location
Click Browse and navigate in the folder tree to select the folder where the target table should be placed in HDFS.
Library
Click Browse and select the library that should be associated with the target table. If this is the first time you are loading this table, you can select the same library that is associated with the source table.