Creating Summary Statistics for a Table

Overview

The Summary Statistics transformation provides an interface to the MEANS procedure. The MEANS procedure provides data summarization tools to perform the following tasks:
  • compute descriptive statistics for variables across all observations and within groups of observations
  • calculate descriptive statistics based on moments
  • estimate quantiles, which includes the median
  • calculate confidence limits for the mean
  • identify extreme values
  • perform a t test
By default, the MEANS procedure displays output. You can also use the OUTPUT statement to store the statistics in a SAS data set. You can use the MEANS procedure to generate a statistical summary. Data is sent to a target table and to the Output tab of the Job Editor. You can also create ODS output.
You can control many aspects of how the target table is created, including the following:
  • the type of analysis
  • analysis options
  • which columns are analyzed
The target table receives data only for the columns that are involved in the analysis. The target requires three columns that the Summary Statistics transformation populates:
_TYPE_
contains the type of statistic.
_FREQ_
contains the frequency.
_STAT_
contains the name of the statistic.
You can specify grouping columns in the Summary Statistics transformation. Doing so causes a SAS BY statement to order target rows according to the values in the grouping columns. The Summary Statistics transformation requires that grouping columns be sorted in ascending order in the source. If you specify grouping columns, you can sort those columns before the Summary Statistics transformation using a SAS Sort transformation.

Problem

You want to generate summary statistics for a table.

Solution

You can use the Summary Statistics transformation in a job that generates summary statistics and creates an ODS document that contains the results. This transformation uses the MEANS procedure to compute descriptive statistics for variables across all observations and within groups of observations. For example, you can create a job similar to the sample job featured in this topic. This sample job generates summary statistics from a source table that contains demographic data about a classroom of students. Note that the output for this job is sent to the Output tab in the Job Editor window and an ODS document that is configured in the job. The sample job includes the following tasks:

Tasks

Create and Populate the Job

Perform the following steps to create and populate the job:
  1. Create an empty SAS Data Integration Studio job.
  2. Select and drag a Summary Statistics transformation from the Analysis folder in the Transformations tree. Then, drop it in the empty job on the Diagram tab in the Job Editor window.
  3. Select and drag the source table out of the Inventory tree. Then, drop it before the Summary Statistics transformation on the Diagram tab.
  4. Drag the cursor from the source table to the input port of the Summary Statistics transformation. This action connects the source to the transformation.
  5. Right-click the Summary Statistics transformation, and click Add Output Port from the Ports option in the drop-down menu. This step enables you to add an output port to the transformation.
  6. Select and drag the source table from the Inventory tree. Then, drop it after the Summary Statistics transformation on the Diagram tab.
  7. Drag the cursor from the Summary Statistics transformation output port to the target table. This action connects the target to the transformation.
The following display shows a sample process flow diagram for a job that contains the Summary Statistics transformation.
Sample Process Flow
Sample Process Flow
Note that the source table for the sample job is named CAKE.

Configure Analytical Options

Use the Options tab in the properties window for the Summary Statistics transformation to configure the SAS tables that are generated in the job and shape the output of your analysis. Note that the Options tab is divided into two parts, with a list of categories on the left-hand side and the options for the selected category on the right-hand side. Perform the following steps to set the options that you need for your job:
  1. Open the properties window for the Summary Statistics transformation in the Diagram tab in the Job Editor window. Then, click the Options tab.
  2. Click Assign columns to access the Assign columns page. Use the column selection prompts to access the columns that you need in the SAS tables generated in your job. For example, you can click Column Selection for the Select analysis columns (VAR statement) to access the Select Data Source Items window, as shown in the following display.
    Sample Select Data Source Items Window
    Sample Select Data Source Items Window
    In the sample job, the VAR statement columns are PresentScore and TasteScore.
  3. Click Basic to access the Statistics > Basic page to set the basic statistical options for the analysis conducted in the job. In the sample job, the Number of observations (N), Mean (MEAN), Maximum (MAX), Minimum (MIN), Range (RANGE), and Standard deviation (STD) options are moved to the Selected field. The statistical options for the sample job are shown in the following display.
    Sample Basic Statistical Options
    Sample Basic Statistical Options
  4. Set additional analytical options as needed. For example, the sample job uses a field width of eight, which limits the output width. This setting is made in the Other PROC MEANS options field on the Additional Options page, as follows:
    fw=8

Configure Reporting Options

Use the remaining option pages to create and save a report based on the analysis conducted in the job. Perform the following steps to set the reporting options:
  1. Click Title and footnotes to access the Title and footnotes page and enter up to three headings and two footnotes.
  2. Click ODS options to access the ODS options page. You can choose between HTML, RTF, and PDF output and enter appropriate settings for each. The sample job uses PDF output. Therefore, a location, a set of keywords, the subject of the report, and code to enable ODS graphics are added to the fields that are displayed when Use PDF is selected in the ODS Result field. (The path specified in the Location field is relative to the SAS Application Server that executes the job.)
    Sample ODS Options
    Sample ODS Options
    Note: You can set additional reporting and formatting options in the Specify other options for OPTIONS statement field on the Other options page. For example, the following options are set for the sample job:
    options nodate pageno=1 linesize=80 pagesize=60
  3. Click OK to save the settings for the Options tab.

Run the Job and View the Output

Perform the following steps to run the job and view the output:
  1. Right-click on an empty area of the job, and click Run in the pop-up menu. SAS Data Integration Studio generates code for the job and submits it to the SAS Application Server for execution. The following display shows a successful run of a sample job.
    Successfully Completed Sample Job
    Successfully Completed Sample Job
  2. If error messages are displayed on the Status tab, read and respond to the messages as needed.
  3. To view the summary statistics, click the Output tab in the Job Editor window. The following display shows the analysis for the sample job.
    Sample Output
    Sample Output
  4. Open the PDF document that you created and saved earlier. The following display illustrates a sample report based on the summary statistics generated by the sample job.
    Sample PDF Output
    Sample PDF Output