LASR Procedure

Example 3: Using the SAS Data in HDFS Engine

Details

The LASR procedure can load tables to memory from HDFS with the SAS Data in HDFS engine. This use is similar to using the HDFS option with the procedure, but has the advantage that you can use FORMAT statements and data set options.

Program

option set=GRIDHOST="grid001.example.com"; 1
option set=GRIDINSTALLLOC="/opt/TKGrid";

libname grp1 sashdat path="/dept/grp1"; 2

proc lasr create port=10010 noclass;
    performance nodes=all;
run;

proc lasr add data=grp1.sales2012 port=10010;
    format predict $dollar20. 3
           actual $dollar20.;

run;

proc lasr add data=grp1.sales2013(where=(region="West")) port=10010; 4
run;

Program Description

  1. The GRIDHOST= and GRIDINSTALLLOC= environment variables are used by the LASR procedure and the GRIDHOST= option is also used by the LIBNAME statement.
  2. The SAS Data in HDFS engine uses the GRIDHOST= environment variable to determine the host name for the NameNode. The PATH= option is used to specify the directory in HDFS.
  3. The FORMAT statement is used to override the format name in HDFS for the variable.
  4. The WHERE clause subsets the Sales2013 table. Only the rows with Region equal to "West" are read into memory. The WHERE clause is useful for subsetting data, but it does not take advantage of the memory efficiencies that are normally used with SASHDAT tables.
If the table in HDFS has variables that are associated with user-defined formats, then you must have the user-defined formats available in the format catalog search order.