• Print  |
  • Feedback  |

Knowledge Base


TS-277

HOW TO RESTRUCTURE YOUR DATA FOR AN ANSWER COMPARISON TABLE

INPUT:

I want to display questionaire response data using the TABULATE procedure. My data are responses (1-4) in four categories (ROOM, STAFF, FOOD, COST) and are collected monthly for a hotel. The responses 1, 2, 3, and 4 represent responses of Great, Good, Fair, and Poor, respectively.

The responses for each category are stored as values of the categorical variables in a data set. The following is the first observation from our data set SURVEY:

MONTH ROOM STAFF FOOD COST

SEP 3 2 1 2

I want to produce a table where the rows contain the categories and the columns contain the responses as follows:

      +-----------+-----------------------+
      |           |       Response        |
      +-----------+-----+-----+-----+-----+
      |Category   |  1  |  2  |  3  |  4  |
      +-----------+-----+-----+-----+-----+
      |Room       |     |     |     |     |
      |Staff      |     |     |     |     |
      |Food       |     |     |     |     |
      |Cost       |     |     |     |     |
      +-----------+-----+-----+-----+-----+

However, with the responses stored in seperate variables, I am unable to generate this table with PROC TABULATE. Is there a way to get this table?

OUTPUT:

You can use the DATA step to restructure your data so you can produce the table you want with the categories independent of the scaled responses.

Step 1: Restructuring the Data Set

Use the following DATA step to change the structure of your data set so that you have one variable to store the categories and another to store the responses.

data survey2;

set survey;
array cat_vars{*} room staff food cost; do category=1 to 4;

       response=cat_vars{category};
       output;

end;
drop room staff food cost;
run;

This DATA step creates a unique observation for each categorical variable. Since there are four categorical variables in the input data set SURVEY, the output data set, SURVEY2, will have four times as many observations as SURVEY. The responses previously stored in the categorical variables will now be stored in a variable called RESPONSE. The names of the categorical variables will now be stored in a variable called CATEGORY as numbers ranging from 1 to 4. The numbers 1 to 4 correspond to the order in which the categorical variables appeared on the ARRAY statement. Output 1 shows the first ten observations in the restructured data set.

Output 1: Partial Listing of SURVEY2

MONTH CATEGORY RESPONSE

                SEP         1           3
                SEP         2           2
                SEP         3           1
                SEP         4           2
                AUG         1           1
                AUG         2           2
                AUG         3           3
                AUG         4           1
                AUG         1           1
                AUG         2           3

Note that we list the category variables ROOM, STAFF, FOOD, and COST in the ARRAY statement. The ARRAY statement groups these variables so that you can easily process them using a DO loop. You can also use a double hyphen (--) to list a range of variables in the ARRAY statement. To do this, list the beginning variable of the range, then the double hyphen, and finally, the ending variable of the range.

array cat_vars{*} room--cost;

If you use the double hyphen, remember that the order of your variables is determined by their order in the original data set.

Step 2: Generating the Table:

The following procedures summarize the SURVEY2 data set created in Step 1 and produce a table with titles and labels. The SAS system options used here enable you to control the page and line size as well as the header information. PROC FORMAT enables you to label the values of CATEGORY, RESPONSE, and MONTH. The PROC TABULATE options used here control column widths and the default label for missing table cell values. Labels for the variables and a title for the table are also included.

options pagesize=60 linesize=64 nodate nonumber nocenter;

proc format;

value $monf 'AUG'='August' 'SEP'='September'; value respf 1='Great' 2='Good' 3='Fair' 4='Poor'; value catf 1='Room' 2='Staff' 3='Food' 4='Cost'; run;

proc tabulate data=survey2 format=9.;

class month category response;
table (month all)*category, response

/ rts=22 misstext='0';

title 'Hotel Survey Results For The Fall'; format month $monf. response respf. category catf.; label month='Month' response='Response'

category='Category';
keylabel all='All' n=' ';
run;

Output 2 shows the final table with formats, labels, and options applied:

Output 2: Final Table

Hotel Survey Results For The Fall


   |                    |          Response           |
   |                    |-----------------------------|
   |                    |  Great  |  Good   |  Fair   |
   |--------------------+---------+---------+---------|
   |Month    |Category  |         |         |         |
   |---------+----------|         |         |         |
   |August   |Room      |        4|        2|        2|
   |         |----------+---------+---------+---------|
   |         |Staff     |        2|        3|        3|
   |         |----------+---------+---------+---------|
   |         |Food      |        2|        4|        2|
   |         |----------+---------+---------+---------|
   |         |Cost      |        1|        4|        3|
   |---------+----------+---------+---------+---------|
   |September|Room      |        4|        6|        5|
   |         |----------+---------+---------+---------|
   |         |Staff     |        2|       10|        3|
   |         |----------+---------+---------+---------|
   |         |Food      |        4|        6|        5|
   |         |----------+---------+---------+---------|
   |         |Cost      |        5|        6|        4|
   |---------+----------+---------+---------+---------|
   |All      |Room      |        8|        8|        7|
   |         |----------+---------+---------+---------|
   |         |Staff     |        4|       13|        6|
   |         |----------+---------+---------+---------|
   |         |Food      |        6|       10|        7|
   |         |----------+---------+---------+---------|
   |         |Cost      |        6|       10|        7|

The table in Output 2 shows the responses in the columns and the categories in the rows. Numbers in the cells represent response frequencies. Remember that responses can range from 1 (Great) to 4 (Poor). No column for "Poor" appears in the table because there are no responses of 4 in the data set. In other words, no response of "Poor" was given. Still, you might like to see a column for "Poor" where the frequency counts are all zero. We will address this situation in the next issue of Observations.

For more information on PROC TABULATE, refer to the SAS Guide to TABULATE Processing, Second Edition. For more information on PROC FORMAT, refer to the SAS Procedures Guide, Version 6, Third Edition. For more information on SAS system options and the DATA step, refer to SAS Language: Reference, Version 6, First Edition.

Answer provided by Jason Sharpe and Jeff Lopes. Jason is a Technical Support Analyst at SAS Institute, specializing in macro and base procedures. Jeff is a Technical Editor at SAS Institute.