Administering the SAS LASR Analytic Server

Administering a Distributed Server

Basic administration of a distributed SAS LASR Analytic Server can be performed with the LASR procedure from a SAS session. Server instances are started and stopped with the LASR procedure. The LASR procedure can be used to load and unload tables from memory though the SAS LASR Analytic Server engine also provides that ability.
The SASHDAT engine is used to add and delete tables from the Hadoop Distributed File System (HDFS). The tables are stored in the SASHDAT file format. You can use the DATASETS procedure with the engine to display information about tables that are stored in HDFS.
The HPDS2 procedure has a specific purpose for use with SAS LASR Analytic Server. In this deployment, the procedure is used to distribute data to the machines in an appliance. After the data are distributed, the SAS LASR Analytic Server can read the data in parallel from each of the machines in the appliance.

Administering a Non-Distributed Server

A non-distributed SAS LASR Analytic Server runs on a single machine. A non-distributed server is started and stopped with the SAS LASR Analytic Server engine. A server is started with the STARTSERVER= option in the LIBNAME statement. The server is stopped when one of the following occurs:
  • The libref is cleared (for example, libname lasrsvr clear;).
  • The SAS program and session that started the server ends. You can use the SERVERWAIT statement in the VASMP procedure to keep the SAS program (and the server) running.
  • The server receives a termination request from the SERVERTERM statement in the VASMP procedure.
A non-distributed deployment does not include a distributed computing environment. As a result, a non-distributed server does not support a co-located data provider. Tables are loaded and unloaded from memory with the SAS LASR Analytic Server engine only.

Common Administration Features

As described in the previous sections, the different architecture for distributed and non-distributed servers requires different methods for starting, stopping, and managing tables with servers. However, the IMSTAT procedure works with distributed and non-distributed servers to provide administrators with information about server instances. The statements that provide information that can be of interest to administrators are as follows:
  • SERVERINFO
  • TABLEINFO
Administrators might also be interested in the SERVERPARM statement. You can use this statement to adjust the number of requests that are processed concurrently. You might reduce the number of concurrent requests if the number of concurrent users causes the server to consume too many sockets from the operating system.

Features Available in SAS Visual Analytics Administrator

SAS LASR Analytic Server is an important part of SAS Visual Analytics. SAS Visual Analytics Administrator is a web application that provides an intuitive graphical interface for server management. You can use the application to start and stop server instances, as well as load and unload tables from the servers. Once a server is started, you can view information about libraries and tables that are associated with the server. The application also indicates whether a table is in-memory or whether it is unloaded.
For deployments that are co-located with Hadoop, an HDFS explorer enables you to browse the tables that are stored in HDFS. Once tables are stored in HDFS, you can to load them into memory in a server instance. Because SAS uses the special SASHDAT file format for the data that is stored in HDFS, the HDFS explorer also provides information about the columns, row count, and block distribution.

Understanding Server Run Time

By default, servers are started and run indefinitely. However, in order to conserve the hardware resources in a distributed computing environment, server instances can be configured to exit after a period of inactivity. This feature applies to distributed SAS LASR Analytic Server deployments only. You specify the inactivity duration with the LIFETIME= option when you start the server.
When the LIFETIME= option is used, each time a server is accessed, such as to view data or perform an analysis, the run time for the server is reset to zero. Each second that a server is unused, the run timer increments to count the number of inactive seconds. If the run timer reaches the maximum run time, the server exits. All the previously used hardware resources become available to the remaining server instances.

Distributing Data

SAS Plug-ins for Hadoop

SAS provides the SAS Plug-ins for Hadoop that you can use to configure a Hadoop cluster as a co-located data provider. The SAS LASR Analytic Server software and the plug-ins are installed on the same hosts in the cluster. The SASHDAT engine can be used to distribute data to HDFS.
For more information, see Using the SASHDAT Engine.

PROC HPDS2 for Big Data

For deployments that use Greenplum or Teradata, the HPDS2 procedure can be used to distribute large data sets to the machines in the appliance. The procedure provides an easy-to-use and efficient method for transferring large data sets.
For deployments that use Greenplum, the procedure is more efficient than using a DATA step with the SAS/ACCESS Interface to Greenplum and is an alternative to using the gpfdist utility.
The SAS/ACCESS Interface for the database must be configured on the client machine. It is important to distribute the data as evenly as possible so that the SAS LASR Analytic Server has an even workload when the data is read into memory.
The following code sample shows a LIBNAME statement and an example of the HPDS2 procedure for adding tables to Greenplum.
libname source "/data/marketing/2012";

libname target greenplm
    server = "grid001.example.com"
    user = dbuser
    password = dbpass
    schema = public
    database = template1
    dbcommit=1000000;

proc hpds2 data = source.mktdata
    out = target.mktdata (distributed_by = 'distributed randomly'); 1

    performance host = "grid001.example.com" 
        install = "/opt/TKGrid";;

    data DS2GTF.out;
        method run();
            set DS2GTF.in;
        end;
    enddata;
run;

proc hpds2 data = source.mkdata2
    out = target.mkdata2 (dbtype=(id='int') 
        distributed_by='distributed by (id)'); 2

    performance host = "grid001.example.com" 
        install = "/opt/TKGrid";

    data DS2GTF.out;
        method run();
            set DS2GTF.in;
        end;
    enddata;
run;
1 The rows of data from the input data set are distributed randomly to Greenplum.
2 The ID column in the input data set is identified as being an integer data type. The rows of data are distributed based on the value of the ID column.
For information about the HPDS2 procedure, see the Base SAS Procedures Guide: High-Performance Procedures. The procedure documentation is available from http://support.sas.com/documentation/cdl/en/prochp/66409/HTML/default/viewer.htm#prochp_hpds2_toc.htm.

Bulkload for Teradata

The SAS/ACCESS Interface to Teradata supports a bulk-load feature. With this feature, a DATA step is as efficient at transferring data as the HPDS2 procedure.
The following code sample shows a LIBNAME statement and two DATA steps for adding tables to Teradata.
libname tdlib teradata
    server="dbc.example.com"
    database=hps
    user=dbuser
    password=dbpass
    bulkload=yes; 1

data tdlib.order_fact;
  set work.order_fact;
run;

data tdlib.product_dim (dbtype=(partno='int') 2
    dbcreate_table_opts='primary index(partno)'); 3
  set work.product_dim;
run;

data tdlib.salecode(dbtype=(_day='int' fpop='varchar(2)') 
    bulkload=yes
    dbcreate_table_opts='primary index(_day,fpop)'); 4
    set work.salecode;
run;

data tdlib.automation(bulkload=yes
    dbcommit=1000000 5
    dbcreate_table_opts='unique primary index(obsnum)'); 6
    set automation;
    obsnum = _n_;
run;
1 Specify the BULKLOAD=YES option. This option is shown as a LIBNAME option but you can specify it as a data set option.
2 Specify a data type of int for the variable named partno.
3 Specify to use the variable named partno as the distribution key for the table.
4 Specify to use the variables that are named _day and fpop as a distribution key for the table that is named salecode.
5 Specify the DBCOMMIT= option when you are loading many rows. This option interacts with the BULKLOAD= option to perform checkpointing. Checkpointing provides known synchronization points if a failure occurs during the loading process.
6 Specify the UNIQUE keyword in the table options to indicate that the primary key is unique. This keyword can improve table loading performance.

Smaller Data Sets

You can use a DATA step to add smaller data sets to Greenplum or Teradata. Transferring small data sets does not need to be especially efficient. The SAS/ACCESS Interface for the database must be configured on the client machine.
The following code sample shows a LIBNAME statement and DATA steps for adding tables to Greenplum.
libname gplib greenplm server="grid001.example.com"
    database=hps
    schema=public
    user=dbuser
    password=dbpass;

data gplib.automation(distributed_by='distributed randomly'); 1
    set work.automation;
run;

data gplib.results(dbtype=(rep='int') 2
    distributed_by='distributed by (rep)') 3;
    set work.results;
run;

data gplib.salecode(dbtype=(day='int' fpop='varchar(2)') 4
    distributed_by='distributed by day,fpop'); 5
    set work.salecode;
run;
1 Specify a random distribution of the data. This data set option is for the SAS/ACCESS Interface to Greenplum.
2 Specify a data type of int for the variable named rep.
3 Specify to use the variable named rep as the distribution key for the table that is named results.
4 Specify a data type of int for the variable named day and a data type of varchar(2) for the variable named fpop.
5 Specify to use the combination of variables day and fpop as the distribution key for the table that is named salecode.
The following code sample shows a LIBNAME statement and a DATA step for adding a table to Teradata.
libname tdlib teradata server="dbc.example.com"
    database=hps
    user=dbuser
    password=dbpass;

data tdlib.parts_dim;
    set work.parts_dim;
run;
For Teradata, the SAS statements are very similar to the syntax for bulk loading. For more information, see Bulkload for Teradata.