|
HOW TO RESTRUCTURE YOUR DATA FOR AN ANSWER COMPARISON TABLE INPUT:I want to display questionaire response data using the TABULATE procedure. My data are responses (1-4) in four categories (ROOM, STAFF, FOOD, COST) and are collected monthly for a hotel. The responses 1, 2, 3, and 4 represent responses of Great, Good, Fair, and Poor, respectively. The responses for each category are stored as values of the categorical variables in a data set. The following is the first observation from our data set SURVEY: MONTH ROOM STAFF FOOD COST SEP 3 2 1 2 I want to produce a table where the rows contain the categories and the columns contain the responses as follows:
+-----------+-----------------------+
| | Response |
+-----------+-----+-----+-----+-----+
|Category | 1 | 2 | 3 | 4 |
+-----------+-----+-----+-----+-----+
|Room | | | | |
|Staff | | | | |
|Food | | | | |
|Cost | | | | |
+-----------+-----+-----+-----+-----+
However, with the responses stored in seperate variables, I am unable to generate this table with PROC TABULATE. Is there a way to get this table? OUTPUT:
You can use the DATA step to restructure your data so you
can produce the table you want with the categories
independent of the scaled responses.
Use the following DATA step to change the structure of your data set so that you have one variable to store the categories and another to store the responses. data survey2;
set survey;
response=cat_vars{category};
output;
end; This DATA step creates a unique observation for each categorical variable. Since there are four categorical variables in the input data set SURVEY, the output data set, SURVEY2, will have four times as many observations as SURVEY. The responses previously stored in the categorical variables will now be stored in a variable called RESPONSE. The names of the categorical variables will now be stored in a variable called CATEGORY as numbers ranging from 1 to 4. The numbers 1 to 4 correspond to the order in which the categorical variables appeared on the ARRAY statement. Output 1 shows the first ten observations in the restructured data set. Output 1: Partial Listing of SURVEY2 MONTH CATEGORY RESPONSE
SEP 1 3
SEP 2 2
SEP 3 1
SEP 4 2
AUG 1 1
AUG 2 2
AUG 3 3
AUG 4 1
AUG 1 1
AUG 2 3
Note that we list the category variables ROOM, STAFF, FOOD, and COST in the ARRAY statement. The ARRAY statement groups these variables so that you can easily process them using a DO loop. You can also use a double hyphen (--) to list a range of variables in the ARRAY statement. To do this, list the beginning variable of the range, then the double hyphen, and finally, the ending variable of the range. array cat_vars{*} room--cost;
If you use the double hyphen, remember that the order of
your variables is determined by their order in the original
data set.
The following procedures summarize the SURVEY2 data set created in Step 1 and produce a table with titles and labels. The SAS system options used here enable you to control the page and line size as well as the header information. PROC FORMAT enables you to label the values of CATEGORY, RESPONSE, and MONTH. The PROC TABULATE options used here control column widths and the default label for missing table cell values. Labels for the variables and a title for the table are also included. options pagesize=60 linesize=64 nodate nonumber nocenter; proc format; value $monf 'AUG'='August' 'SEP'='September'; value respf 1='Great' 2='Good' 3='Fair' 4='Poor'; value catf 1='Room' 2='Staff' 3='Food' 4='Cost'; run; proc tabulate data=survey2 format=9.;
class month category response; / rts=22 misstext='0'; title 'Hotel Survey Results For The Fall'; format month $monf. response respf. category catf.; label month='Month' response='Response'
category='Category'; Output 2 shows the final table with formats, labels, and options applied: Output 2: Final Table Hotel Survey Results For The Fall | | Response | | |-----------------------------| | | Great | Good | Fair | |--------------------+---------+---------+---------| |Month |Category | | | | |---------+----------| | | | |August |Room | 4| 2| 2| | |----------+---------+---------+---------| | |Staff | 2| 3| 3| | |----------+---------+---------+---------| | |Food | 2| 4| 2| | |----------+---------+---------+---------| | |Cost | 1| 4| 3| |---------+----------+---------+---------+---------| |September|Room | 4| 6| 5| | |----------+---------+---------+---------| | |Staff | 2| 10| 3| | |----------+---------+---------+---------| | |Food | 4| 6| 5| | |----------+---------+---------+---------| | |Cost | 5| 6| 4| |---------+----------+---------+---------+---------| |All |Room | 8| 8| 7| | |----------+---------+---------+---------| | |Staff | 4| 13| 6| | |----------+---------+---------+---------| | |Food | 6| 10| 7| | |----------+---------+---------+---------| | |Cost | 6| 10| 7| The table in Output 2 shows the responses in the columns and the categories in the rows. Numbers in the cells represent response frequencies. Remember that responses can range from 1 (Great) to 4 (Poor). No column for "Poor" appears in the table because there are no responses of 4 in the data set. In other words, no response of "Poor" was given. Still, you might like to see a column for "Poor" where the frequency counts are all zero. We will address this situation in the next issue of Observations. For more information on PROC TABULATE, refer to the SAS Guide to TABULATE Processing, Second Edition. For more information on PROC FORMAT, refer to the SAS Procedures Guide, Version 6, Third Edition. For more information on SAS system options and the DATA step, refer to SAS Language: Reference, Version 6, First Edition. Answer provided by Jason Sharpe and Jeff Lopes. Jason is a Technical Support Analyst at SAS Institute, specializing in macro and base procedures. Jeff is a Technical Editor at SAS Institute. |