IMSTAT Procedure

Concepts: IMSTAT Procedure

Partitioned Tables

A SAS LASR Analytic Server table can be partitioned, and it can also be ordered within each partition. A partition is a collection of observations that share the same key value and are located on the same worker node. The key value can be constructed from one or more variables. The partitioning keys are created according to the formatted values of the specified variables. Partitioning of data is an important tool in managing distributed data sources. The process of partitioning can consume computing and network resources as well as create greater imbalance in the data compared to a round-robin distribution. However, partitioned data can be accessed more quickly and subsequent procedure statements can execute more quickly.
You can achieve a partitioned table as follows:
  • load a table with the SAS LASR Analytic Server engine using the PARTITION= data set option
  • load a SASHDAT file that has been previously partitioned into HDFS by using the PARTITION= data set option of the SAS Data in HDFS engine
  • re-partition data from one table into a temporary table (you can then make it a regular table with the PROMOTE statement)
  • create a temporary table with a GROUPBY= option. The temporary table is partitioned by the formatted values of the GROUPBY= variables.
For more information, see Data Partitioning and Ordering.

RUN-Group Processing

The IMSTAT procedure supports RUN-group processing. RUN-group processing enables you to submit RUN groups without ending the procedure. This feature is particularly useful for running SAS interactively. You can start the procedure with the PROC IMSTAT statement and then execute statements like TABLE and PROMOTE. Each statement runs when it reaches a RUN statement.
To use RUN-group processing, you start the procedure and then submit multiple RUN-groups. A RUN-group is a group of statements that contains at least one action statement and ends with a RUN statement. As long as you do not terminate the procedure, it remains active and you do not need to resubmit the PROC statement.
To end RUN-group processing, submit a RUN CANCEL statement. Statements that have not been submitted are terminated. To stop the procedure, submit a QUIT statement. Statements that have not been submitted are terminated as well.

WHERE Clause Processing

There are two important features for WHERE clause processing that are related to the IMSTAT procedure. The first is that the WHERE clause is applied to the data by the server. When you use a WHERE clause to subset data, the subsetting is performed by the server and only the rows that meet the WHERE clause criteria are affected by subsequent operations.
The second important feature for WHERE clause processing is related to the RUN-group processing that the IMSTAT procedure supports. You can modify the WHERE clause between statements.
For an example of the IMSTAT procedure with a WHERE clause, see Deleting Rows and Saving a Table to HDFS.

Temporary Tables

A temporary table is an in-memory table that contains the result set of a procedure statement. Instead of transferring the results to the client SAS session, the results remain in the server and only the name of the temporary table is transferred to the client. You can then use other procedure statements with the temporary table.
Temporary tables are supported for the SUMMARY, CROSSTAB, DISTINCT, and PERCENTILE statements for partitioned data. For non-partitioned data, you can also generate temporary tables with the SUMMARY and CROSSTAB statements, provided that you request a GROUPBY analysis.
The following DATA step shows how to create a partitioned table on the variables country and region.
data lasr.prdsale(partition=(country region));
    set sashelp.prdsale;
run;
The following statements generate a summary analysis for variables actual and predict in each of the partitions.
proc imstat;
   table lasr.prdsale;
   summary actual predict / partition;
run;
The output for the previous statements is as follows:
Summary of actual and predict for PRDSALE.
As an alternative, you can leave the result set in an in-memory table by adding the TEMPTABLE option to the SUMMARY statement:
   summary actual predict / partition temptable;
run;
The previous SAS statements generate the following output in the SAS session.
Summary of actual and predict for PRDSALE stored to a temporary table.
The temporary table is assigned a name by the server. When the IMSTAT procedure ends, any temporary tables created during the procedure run are removed from the server. Since the generated name is not predictable, the procedure assigns the name of the most recently generated temporary table to the _TEMPLAST_ macro variable.
You can use the TABLE statement to switch the active table to the temporary table and perform analyses. Make sure that the statement that generated the temporary table is separated from the next statement with a RUN statement. Otherwise, you receive an error that the table specified in the TABLE statement does not exist. The temporary table does not exist at parse time, it is created at run time when the statement is executed.
The following statements retrieve information about the temporary table, the formatted values for (up to) the first twenty rows, and perform a summarization:
  table lasr.&_templast_
    tableinfo; 
    columninfo;
    fetch / from=1 to=20 format;
    summary;
quit;
The output for the TABLEINFO, COLUMNINFO, and FETCH statements is not shown. The results for the SUMMARY statement are as follows:
Summary statistics for a temporary table.

Creating Temporary Tables with GROUPBY Variables

If the input table is not partitioned, you can still use temporary tables with the SUMMARY and CROSSTAB statements, provided that you perform a group-by analysis. The temporary table that is created by the server is automatically partitioned by the group-by variables. This potentially involves a redistribution of the groups in the result set to transfer all the result records for a particular group to the same worker node.

Creating Temporary Variables

You can use temporary numeric variables in a table. For example, if a table has variables that are named x1, x2, and x3, you can calculate the summary statistics for variable d1 = x1 + x2 / 3*x3. One way is to declare d1 as a temporary variable of the table (with data set options for the input table). You can use the temporary variables (temporary for the duration of each statement) with DATA step expression scripts.
proc imstat data=lasrlib.table1(array=(d,1));
    summary d1 x1-x3 / tempnames=d1 tempexpress="d1 = x1 + x2 / 3*x3;";
run;
    summary d1 / tempnames=d1 tempexpress="d1 = mean(x1, x2);";
quit;
Because the temporary variable exists for the duration of the statement, it can be reused in subsequent statements. The second SUMMARY statement uses the d1 variable with a different expression.