The Data Table

Finding Observations

You can select observations in the data table by using the Find dialog box. (For a way to graphically and interactively select observations that satisfy multiple constraints, see Chapter 11, "Techniques for Exploring Data.") You can open the Find dialog box (shown in Figure 4.11) by selecting Edit \blacktriangleright\,Find from the main menu.

ugdtfinddlg.png (6271 bytes)

Figure 4.11: The Find Dialog Box

The following list describes each item in the Find dialog box.

Variable
chooses the variable whose values are examined. The list includes each variable in the data set.
Operation
selects the logical operation used to compare each observation with the contents of the Value field.
Value
specifies the value used to select observations.
Apply variable's informat to value
applies the variable's informat to the contents of the Value field. If the variable does not have an informat, then this item is inactive.
Apply format to each value during search
applies the variable's format to the variable and then compares the formatted data to the contents of the Value field. If the variable does not have a format, then this item is inactive.
Match case
specifies that each observation is compared to the contents of the Value field in a case-sensitive manner. If the variable is numeric, then this item is inactive.
Use tolerance of
specifies that a tolerance, \epsilon, is used in comparing each observation to the contents of the Value field. Table 4.1 specifies how \epsilon is used. If the chosen variable is a character variable, then this item is inactive.
Clear existing selection
specifies that all observations are searched, but only the observations that match the search criterion are selected.
Search within existing selection
specifies that only the observations that are selected are searched. You can use this option to perform logical AND operations.
Add to existing selection
specifies that all observations are searched, but observations that were selected prior to the search remain selected. You can use this option to perform logical OR operations.

For numeric variables, let v be the value of the Value field and let \epsilon be the value of the Use tolerance of field. (If you are not using a tolerance, then \epsilon=0.) Table 4.1 specifies whether an observation with value x for the chosen variable matches the query.

Table 4.1: Find Operations for Numeric Variables
Operation Values Found Missing Selected?
Equals x \in [v-\epsilon, v+\epsilon] No
Less than x \lt v+\epsilon Yes
Greater than x \gt v-\epsilon No
Not equals x \notin [v-\epsilon, v+\epsilon] Yes
Less than or equals x \leq v+\epsilon Yes
Greater than or equals x \geq v-\epsilon No
Is missing x is missing Yes

To remember whether missing values match the query, recall that SAS missing values are represented as large negative numbers. Table 4.1 is consistent with the WHERE clause in the SAS DATA step.

For character variables, comparisons are performed according to the ASCII order of characters. In particular, all uppercase letters [A - Z] precede lowercase characters [a - z]. Let v be the value of the Value field and let v \prec x indicate that v precedes x in ASCII order. Table 4.2 specifies whether an observation with value x for the chosen variable matches the query.

Table 4.2: Find Operations for Character Variables
Operation Values Found Missing Selected?
Equals x = v No
Less than x \prec v Yes
Greater than v \prec x No
Not equals x \neq v Yes
Less than or equals x \preceq v Yes
Greater than or equals v \preceq x No
Is missing x is missing Yes
Contains x contains v No
Does not contains x does not contain v Yes
Begins with x begins with v No

To help remember whether character missing values match the query, think of the character missing value as being a zero-length string that contain no characters. Table 4.2 is consistent with the WHERE clause in the SAS DATA step.

As a first example, Figure 4.11 shows how to find observations in the Hurricanes data set whose latitude variable is contained in the interval [28,32]. This is a quick way to find observations with latitudes between 28 and 32 in a single search.

A second example is shown in Figure 4.12. This search finds observations for which the date variable strictly precedes 07AUG1988. Note that the date variable has a DATE9. informat, so you can use that informat to make it more convenient to input the contents of the Value field. (Without the informat, you would need to search for the value 10445, the SAS date value corresponding to 06AUG1988.) Note that the date variable is a numeric variable, even though the formatted values appear as text.

ugdtfinddlgdate.png (5980 bytes)

Figure 4.12: Searching for Dates

A related example is shown in Figure 4.13. This search finds all observations for which the date variable contains the text "AUG". Note that to perform this search you must check Apply format to each value during search. This forces the Find dialog box to apply the DATE9. format to the date variable, which means comparing strings (character data) instead of numbers (numeric data). You can then select Contains from the Operation list. Each formatted string is searched for the value "AUG".

ugdtfinddlgdate2.png (5874 bytes)

Figure 4.13: Matching Text in a Formatted Variable


Previous Page | Next Page | Top of Page