# Techniques for Exploring Data

## Ordering Categories of a Nominal Variable

Subsections:

This section describes how to specify the order of categories for a nominal variable. You cannot change the order of values for interval variables.

By default, numeric nominal variables are ordered numerically, whereas character nominal variables are arranged in ASCII order. In ASCII order, numerals precede uppercase letters, which precede lowercase letters. Even if a variable has a SAS format, SAS/IML Studio determines the default order of categories by using the ASCII order of the unformatted values.

When the data table is active, you can select EditVariablesOrdering to change the order of categories for a nominal variable. You can order nominal variables according to the ASCII order of values, the frequency count of values, or the data order of values. For each ordering, you can specify whether to base the order on formatted or unformatted values. Therefore, there are six possible ways to order a nominal variable. Four of these orderings are the same as provided by the ORDER= option of the FREQ procedure. An ordering determines the order of categories in a plot (for example, a bar chart) and also the order of sorted observations when sorting a variable in a data table.

As an example, consider the data presented in Table 11.1.

Table 11.1: Sample Data

Observation

Y

1

C

2

B

3

C

4

a

5

a

6

a

The `Y` variable has three categories: `a`, `B`, and `C`. The ASCII order of this data is {`B``C``a`} because uppercase letters precede lowercase letters. The data order is {`C``B``a`} because as you traverse the data from top to bottom, `C` is the first value you encounter, followed by `B`, followed by `a`. The order by frequency count is {`a``C``B`}, because there are three observations with the value `a`, two with the value `C`, and one with the value `B`.

If you specify an ordering based on formatted values when the variable does not have a SAS format, then SAS/IML Studio applies either a BEST12. format (for numeric variables) or a \$w. format (for character variables).

When a variable has missing values, the missing values are always ordered first.