TABULATE Procedure

Example 11: Reporting on Multiple-Choice Survey Data

Features:

TABLE statement: N statistic

Other features:

FORMAT procedure

TRANSPOSE procedure

Data set options: RENAME=

Data set: RADIO

Details

This report of listener preferences shows how many listeners select each type of programming during each of seven time periods on a typical weekday. The data was collected by a survey, and the results were stored in a SAS data set. Although this data set contains all the information needed for this report, the information is not arranged in a way that PROC TABULATE can use.
To make this crosstabulation of time of day and choice of radio programming, you must have a data set that contains a variable for time of day and a variable for programming preference. PROC TRANSPOSE reshapes the data into a new data set that contains these variables. Once the data are in the appropriate form, PROC TABULATE creates the report.

Collecting the Data

The following figure shows the survey form that is used to collect data.
Completed Survey Form
Completed Survey Form
An external file contains the raw data for the survey. Several lines from that file appear here.
967 32 f 5 3 5
7 5 5 5 7 0 0 0 8 7 0 0 8 0
781 30 f 2 3 5
5 0 0 0 5 0 0 0 4 7 5 0 0 0
859 39 f 1 0 5
1 0 0 0 1 0 0 0 0 0 0 0 0 0

 ... more data lines ...

859 32 m .25 .25 1
1 0 0 0 0 0 0 0 1 0 0 0 0 0

Program

data radio;
   infile 'input-file' missover;
   input /(Time1-Time7) ($1. +1);
   listener=_n_;
run;
proc format;
   value $timefmt 'Time1'='6-9 a.m.'
                  'Time2'='9 a.m. to noon'
                  'Time3'='noon to 1 p.m.'
                  'Time4'='1-4 p.m.'
                  'Time5'='4-6 p.m.'
                  'Time6'='6-10 p.m.'
                  'Time7'='10 p.m. to 2 a.m.'
                    other='*** Data Entry Error ***';
   value $pgmfmt      '0'="Don't Listen"
                  '1','2'='Rock and Top 40'
                      '3'='Country'
              '4','5','6'='Jazz, Classical, and Easy Listening'
                      '7'='News/ Information /Talk'
                      '8'='Other'
                    other='*** Data Entry Error ***';
run;
proc transpose data=radio
               out=radio_transposed(rename=(col1=Choice))
               name=Timespan;
   by listener;
   var time1-time7;
run;
proc tabulate data=radio_transposed format=12.;
format timespan $timefmt. choice $pgmfmt.;
   class timespan choice;
   table timespan='Time of Day',
         choice='Choice of Radio Program'*n='Number of Listeners';
   title 'Listening Preferences on Weekdays';
run;

Program Description

Create the RADIO data set and specify the input file.RADIO contains data from a survey of 336 listeners. The data set contains information about listeners and their preferences in radio programming. The INFILE statement specifies the external file that contains the data. MISSOVER prevents the input pointer from going to the next record if it fails to find values in the current line for all variables that are listed in the INPUT statement.
data radio;
   infile 'input-file' missover;
   input /(Time1-Time7) ($1. +1);
   listener=_n_;
run;
Create the $TIMEFMT. and $PGMFMT. formats.PROC FORMAT creates formats for the time of day and the choice of programming.
proc format;
   value $timefmt 'Time1'='6-9 a.m.'
                  'Time2'='9 a.m. to noon'
                  'Time3'='noon to 1 p.m.'
                  'Time4'='1-4 p.m.'
                  'Time5'='4-6 p.m.'
                  'Time6'='6-10 p.m.'
                  'Time7'='10 p.m. to 2 a.m.'
                    other='*** Data Entry Error ***';
   value $pgmfmt      '0'="Don't Listen"
                  '1','2'='Rock and Top 40'
                      '3'='Country'
              '4','5','6'='Jazz, Classical, and Easy Listening'
                      '7'='News/ Information /Talk'
                      '8'='Other'
                    other='*** Data Entry Error ***';
run;
Reshape the data by transposing the RADIO data set.PROC TRANSPOSE creates RADIO_TRANSPOSED. This data set contains the variable Listener from the original data set. It also contains two transposed variables: Timespan and Choice. Timespan contains the names of the variables (Time1-Time7) from the input data set that are transposed to form observations in the output data set. Choice contains the values of these variables. (See Details for a complete explanation of the PROC TRANSPOSE step.)
proc transpose data=radio
               out=radio_transposed(rename=(col1=Choice))
               name=Timespan;
   by listener;
   var time1-time7;
run;
Create the report and specify the table options.The FORMAT= option specifies the default format for the values in each table cell.
proc tabulate data=radio_transposed format=12.;
Format the transposed variables.The FORMAT statement permanently associates these formats with the variables in the output data set.
format timespan $timefmt. choice $pgmfmt.;
Specify subgroups for the analysis.The CLASS statement identifies Timespan and Choice as class variables.
   class timespan choice;
Define the table rows and columns.The TABLE statement creates a row for each formatted value of Timespan and a column for each formatted value of Choice. In each column are values for the N statistic. Text in quotation marks supplies headings for the corresponding rows or columns.
   table timespan='Time of Day',
         choice='Choice of Radio Program'*n='Number of Listeners';
Specify the title.
   title 'Listening Preferences on Weekdays';
run;

Output

Listening Preferences on Weekdays

Details

Reshape the Data

The original input data set has all the information that you need to make the crosstabular report, but PROC TABULATE cannot use the information in that form. PROC TRANSPOSE rearranges the data so that each observation in the new data set contains the variable Listener, a variable for time of day, and a variable for programming preference. The following figure illustrates the transposition. PROC TABULATE uses this new data set to create the crosstabular report.
PROC TRANSPOSE restructures data so that values that were stored in one observation are written to one variable. You can specify which variables you want to transpose. This section illustrates how PROC TRANSPOSE reshapes the data. The following section explains the PROC TRANSPOSE step in this example.
When you transpose with BY processing, as this example does, you create from each BY group one observation for each variable that you transpose. In this example, Listener is the BY variable. Each observation in the input data set is a BY group because the value of Listener is unique for each observation.
This example transposes seven variables, Time1 through Time7. Therefore, the output data set has seven observations from each BY group (each observation) in the input data set.
Transposing Two Observations
Transposing Two Observations

Understanding the PROC TRANSPOSE Step

Here is a detailed explanation of the PROC TRANSPOSE step that reshapes the data:
proc transpose data=radio   1
               out=radio_transposed(rename=(col1=Choice))   2

               name=Timespan;   3
   by listener;   4
   var time1-time7;    5
   format timespan $timefmt. choice $pgmfmt.;   6
run;
1 The DATA= option specifies the input data set.
2 The OUT= option specifies the output data set. The RENAME= data set option renames the transposed variable from COL1 (the default name) to Choice.
3 The NAME= option specifies the name for the variable in the output data set that contains the name of the variable that is being transposed to create the current observation. By default, the name of this variable is _NAME_.
4 The BY statement identifies Listener as the BY variable.
5 The VAR statement identifies Time1 through Time7 as the variables to transpose.
6 The FORMAT statement assigns formats to Timespan and Choice. The PROC TABULATE step that creates the report does not need to format Timespan and Choice because the formats are stored with these variables.