IMSTAT Procedure (Data and Server Management)

Concepts: IMSTAT Procedure

Partitioned Tables

A SAS LASR Analytic Server table can be partitioned, and it can also be ordered within each partition. A partition is a collection of observations that share the same key value and are located on the same worker node. The key value can be constructed from one or more variables. The partitioning keys are created according to the formatted values of the specified variables. Partitioning of data is an important tool in managing distributed data sources. The process of partitioning can consume computing and network resources as well as create greater imbalance in the data compared to a round-robin distribution. However, partitioned data can be accessed more quickly and subsequent procedure statements can execute more quickly.
You can achieve a partitioned table as follows:
  • load a table with the SAS LASR Analytic Server engine using the PARTITION= data set option
  • load a SASHDAT file that has been previously partitioned into HDFS by using the PARTITION= data set option of the SASHDAT engine
  • re-partition data from one table into a temporary table (you can then make it a regular table with the PROMOTE statement)
  • create a temporary table with a GROUPBY= option. The temporary table is partitioned by the formatted values of the GROUPBY= variables.
For more information, see Data Partitioning and Ordering.

RUN-Group Processing

The IMSTAT procedure supports RUN-group processing. RUN-group processing enables you to submit RUN groups without ending the procedure. This feature is particularly useful for running SAS interactively. You can start the procedure with the PROC IMSTAT statement and then execute statements like SUMMARY and FREQUENCY. Each statement runs when it reaches a RUN statement.
To use RUN-group processing, you start the procedure and then submit multiple RUN-groups. A RUN-group is a group of statements that contains at least one action statement and ends with a RUN statement. As long as you do not terminate the procedure, it remains active and you do not need to resubmit the PROC statement.
To end RUN-group processing, submit a RUN CANCEL statement. Statements that have not been submitted are terminated. To stop the procedure, submit a QUIT statement. Statements that have not been submitted are terminated as well.

WHERE Clause Processing

There are two important features for WHERE clause processing that are related to the IMSTAT procedure. The first is that the WHERE clause is applied to the data by the server. When you use a WHERE clause to subset data, the subsetting is performed by the server. Only the rows that meet the WHERE clause criteria are affected by subsequent operations.
The second important feature for WHERE clause processing is related to the RUN-group processing that the IMSTAT procedure supports. You can modify the WHERE clause between statements. Unless a WHERE clause is specified in a RUN block, no subsetting of rows occurs. In the following code example, the SUMMARY statement in the first RUN-group is not subject to a WHERE clause. The FREQUENCY statement in the second RUN-group applies only to observations for which Division='EDUCATION.'
proc imstat data=example.prdsale(tag=sashelp);
    summary actual predict / groupby=(region);
run;
    where division='EDUCATION';
    frequency prodtype;
run;
If you specify WHERE clauses in different RUN blocks, the clauses replace each other. A note is written to the SAS log to indicate the change. For example, the SUMMARY statement in the following code example applies to observations for which the Division='CONSUMER.' The FREQUENCY statement applies to observations for which Region='EAST.'
proc imstat data=example.prdsale(tag=sashelp);
    where division='CONSUMER';
    summary actual predict / groupby=(region);
run;
    where region='EAST';
    frequency prodtype;
run;
When the FREQUENCY statement runs, the following line is added to the SAS log.
NOTE: WHERE clause has been replaced.
WHERE clauses can remain active across RUN statements. The following example is the same as the previous example, except that the second WHERE clause is not submitted.
proc imstat data=example.prdsale(tag=sashelp);
    where division='CONSUMER';
    summary actual predict / groupby=(region);
run;
    /* where region='EAST'; */
    frequency prodtype;
run;
In this case, the SAS log includes the following note.
NOTE: A WHERE clause remains active from a previous RUN 
      block: '(division='CONSUMER')'.
You can clear a WHERE clause by submitting WHERE;.
Each time you access a different table with the TABLE statement, the WHERE clause is cleared. In following example, the second FREQUENCY statement is not restricted to observations for which Region='EAST' because the TABLE statement that accesses Prdsal2 clears the WHERE clause.
proc imstat;
    table example.prdsale(tag=sashelp);
    where region='EAST';
    frequency prodtype;
run;

    table example.prdsal2(tag=sashelp);
    frequency prodtype;
run;
The SAS log indicates that the WHERE clause is no longer applied.
NOTE: The WHERE statement is cleared when you open a LASR Analytic Server 
      table with the TABLE statement.

Temporary Tables

A temporary table is an in-memory table that contains the result set of a procedure statement. Instead of transferring the results to the client SAS session, the results remain in the server and only the name of the temporary table is transferred to the client. You can then use other procedure statements with the temporary table.
Temporary tables can be partitioned and the SUMMARY, CROSSTAB, DISTINCT, and PERCENTILE statements perform this action. For non-partitioned data, you can also generate temporary tables with the SUMMARY and CROSSTAB statements, provided that you request a GROUPBY analysis.
The following DATA step shows how to create a partitioned table on the variables Country and Region.
data lasr.prdsale(partition=(country region));
    set sashelp.prdsale;
run;
The following statements generate a summary analysis for variables Actual and Predict in each of the partitions.
proc imstat;
   table lasr.prdsale;
   summary actual predict / partition;
run;
The output for the previous statements is as follows:
Summary of actual and predict for PRDSALE.
As an alternative, you can leave the result set in an in-memory table by adding the TEMPTABLE option to the SUMMARY statement:
   summary actual predict / partition temptable;
run;
The previous SAS statements generate the following output in the SAS session.
Summary of actual and predict for PRDSALE stored to a temporary table.
The temporary table is assigned a name by the server. When the IMSTAT procedure ends, any temporary tables created during the procedure run are removed from the server. Since the generated name is not predictable, the procedure assigns the name of the most recently generated temporary table to the _TEMPLAST_ macro variable.
You can use the TABLE statement to switch the active table to the temporary table and perform analyses. Make sure that the statement that generated the temporary table is separated from the next statement with a RUN statement. Otherwise, you receive an error that the table specified in the TABLE statement does not exist. The temporary table does not exist at parse time, it is created at run time when the statement is executed.
The following statements retrieve information about the temporary table, the formatted values for (up to) the first twenty rows, and perform a summarization:
  table lasr.&_templast_
    tableinfo; 
    columninfo;
    fetch / from=1 to=20 format;
    summary;
quit;
The output for the TABLEINFO, COLUMNINFO, and FETCH statements is not shown. The results for the SUMMARY statement are as follows:
Summary statistics for a temporary table.

Creating Temporary Tables with GROUPBY Variables

Temporary tables play an important role in partitioning in the server. Temporary tables that are created by the DISTINCT, SUMMARY, CROSSTAB, or PARTITION statements are partitioned.
If the input table is not partitioned, you can still use temporary tables with the SUMMARY and CROSSTAB statements, provided that you perform a group-by analysis. The temporary table that is created by the server is automatically partitioned by the group-by variables. This potentially involves a redistribution of the groups in the result set to transfer all the result records for a particular group to the same worker node.

Creating Temporary Variables

You can use temporary variables in a table. For example, if a table has variables that are named x1, x2, and x3, you can calculate the summary statistics for variable d1 = x1 + x2 / 3*x3. One way is to declare d1 as a temporary variable of the table (with data set options for the input table). You can use the temporary variables (temporary for the duration of each statement) with DATA step expression scripts.
proc imstat data=lasrlib.table1(array=(d,1));
    summary d1 x1 -- x3 / tempnames=d1 tempexpress="d1 = x1 + x2 / 3*x3;";
run;
    summary d1 / tempnames=d1 tempexpress="d1 = mean(x1, x2);";
quit;
Because the temporary variable exists for the duration of the statement, its name can be reused in subsequent statements. The second SUMMARY statement uses the same name, d1, for the temporary variable but it has a different value.
You can also create temporary character variables. The following example creates a program that concatenates the first character of the Type variable and the first character of the Origin variable.
libname lasrlib sasiola host="hostname.example.com" port=10010 tag=hps;

data lasrlib.cars; set sashelp.cars; run;

data _null_;
    file 'concat.sas';
    put "c1 = substr(type,1,1) || substr(origin,1,1);";
run;

filename fref 'concat.sas';

proc imstat;
    table lasrlib.cars(tempnames=(c1 $));  1
    summary horsepower / groupby=c1 tn=(c1) te=fref;  2
run;
    crosstab c1*type / tn=(c1) te=fref;
run;
1 The TEMPNAMES= data set option for the SAS LASR Analytic Server engine reserves the variable name, c1, for the temporary variable that does not exist. The $ indicates that it is a character variable.
2 The temporary variable name can be used in a variety of expressions. The TEMPNAMES= and TEMPEXPRESS= options must be specified in every statement that uses the temporary variable.