Previous Page | Next Page

Techniques for Exploring Data

Ordering Categories of a Nominal Variable

This section describes how to specify the order of categories for a nominal variable. You cannot change the order of values for interval variables.

By default, numeric nominal variables are ordered numerically, whereas character nominal variables are arranged in liguistic order. (Even if a variable has a SAS format, SAS/IML Studio determines the default order of categories by using the linguistic order of the unformatted values.) In linguistic order, values are sorted according to the language rules for the locale that is specified in the Windows operating system. In English, punctuation marks precede numerals, numerals precede letters, and a lowercase letter (for example, 'a') precedes the same letter in uppercase (for example, 'A'). For example, the following English characters are sorted: '0', '9', 'a', 'A', 'b', 'B'. The character for a missing value (a blank character) precedes nonmissing characters.

When the data table is active, you can select Edit Variables Ordering to change the order of categories for a nominal variable. You can order nominal variables according to the linguistic order of values, the frequency count of values, or the data order of values. For each ordering, you can specify whether to base the order on formatted or unformatted values. Therefore, there are six possible ways to order a nominal variable. Four of these orderings are the same as provided by the ORDER= option of the FREQ procedure. An ordering determines the order of categories in a plot (for example, a bar chart) and also the order of sorted observations when sorting a variable in a data table.

As an example, consider the data presented in Table 11.1.

Table 11.1 Sample Data

Observation

Y

1

C

2

B

3

C

4

a

5

a

6

a

The Y variable has three categories: a, B, and C. The linguistic order of this data is {aBC}. The data order is {CBa} because as you traverse the data from top to bottom, C is the first value you encounter, followed by B, followed by a. The order by frequency count is {aCB}, because there are three observations with the value a, two with the value C, and one with the value B.

If you specify an ordering based on formatted values when the variable does not have a SAS format, then SAS/IML Studio applies either a BEST12. format (for numeric variables) or a $w. format (for character variables).

When a variable has missing values, the missing values are always ordered first.

Previous Page | Next Page | Top of Page