Finding Observations

You can select observations in the data table by using the Find dialog box. (For a way to graphically and interactively select observations that satisfy multiple constraints, see Chapter 11: Techniques for Exploring Data.) You can open the Find dialog box (shown in Figure 4.11) by selecting EditFind from the main menu.

Figure 4.11: The Find Dialog Box

The Find Dialog Box


The Find dialog box contains the following UI controls:

Variable

chooses the variable whose values are examined. The list includes each variable in the data set.

Operation

selects the logical operation used to compare each observation with the contents of the Value field.

Value

specifies the value used to select observations.

Apply variable’s informat to value

applies the variable’s informat to the contents of the Value field. If the variable does not have an informat, then this item is inactive.

Apply format to each value during search

applies the variable’s format to the variable and then compares the formatted data to the contents of the Value field. If the variable does not have a format, then this item is inactive.

Match case

specifies that each observation be compared to the contents of the Value field in a case-sensitive manner. If the variable is numeric, then this item is inactive.

Use tolerance of

specifies that a tolerance, $\epsilon $, be used in comparing each observation to the contents of the Value field. Table 4.1 specifies how $\epsilon $ is used. If the chosen variable is a character variable, then this item is inactive.

Clear existing selection

specifies that all observations be searched, but only the observations that match the search criterion be selected.

Search within existing selection

specifies that only the observations that are selected be searched. You can use this option to perform logical AND operations.

Add to existing selection

specifies that all observations be searched, but observations that were selected prior to the search remain selected. You can use this option to perform logical OR operations.

For numeric variables, let $v$ be the value of the Value field and let $\epsilon $ be the value of the Use tolerance of field. (If you are not using a tolerance, then $\epsilon =0$.) Table 4.1 specifies whether an observation with value $x$ for the chosen variable matches the query.

Table 4.1: Find Operations for Numeric Variables

Operation

Values Found

Missing Selected?

Equals

$x \in [v-\epsilon , v+\epsilon ]$

No

Less than

$x < v+\epsilon $

Yes

Greater than

$x > v-\epsilon $

No

Not equals

$x \notin [v-\epsilon , v+\epsilon ]$

Yes

Less than or equals

$x \leq v+\epsilon $

Yes

Greater than or equals

$x \geq v-\epsilon $

No

Is missing

$x$ is missing

Yes


To remember whether missing values match the query, recall that SAS missing values are represented as large negative numbers. Table 4.1 is consistent with the WHERE clause in the SAS DATA step.

For character variables, comparisons are performed according to the ASCII order of characters. In particular, all uppercase letters [A–Z] precede lowercase characters [a–z]. Let $v$ be the value of the Value field and let $v \prec x$ indicate that $v$ precedes $x$ in ASCII order. Table 4.2 specifies whether an observation with value $x$ for the chosen variable matches the query.

Table 4.2: Find Operations for Character Variables

Operation

Values Found

Missing Selected?

Equals

$x = v$

No

Less than

$x \prec v$

Yes

Greater than

$v \prec x$

No

Not equals

$x \neq v$

Yes

Less than or equals

$x \preceq v$

Yes

Greater than or equals

$v \preceq x$

No

Is missing

$x$ is missing

Yes

Contains

$x$ contains $v$

No

Does not contains

$x$ does not contain $v$

Yes

Begins with

$x$ begins with $v$

No


To help remember whether character missing values match the query, think of the character missing value as being a zero-length string that contain no characters. Table 4.2 is consistent with the WHERE clause in the SAS DATA step.

As a first example, Figure 4.11 shows how to find observations in the Hurricanes data set whose latitude variable is contained in the interval $[28,32]$. This is a quick way to find observations with latitudes between 28 and 32 in a single search.

A second example is shown in Figure 4.12. This search finds observations for which the date variable strictly precedes 07AUG1988. The date variable has a DATE9. informat, so you can use that informat to make it more convenient to input the contents of the Value field. (Without the informat, you would need to search for the value 10445, the SAS date value that corresponds to 06AUG1988.) Recall that the date variable is a numeric variable, even though the formatted values appear as text.

Figure 4.12: Searching for Dates

Searching for Dates


A related example is shown in Figure 4.13. This search finds all observations for which the date variable contains the text AUG. To perform this search you must check Apply format to each value during search. This forces the Find dialog box to apply the DATE9. format to the date variable, which means comparing strings (character data) instead of numbers (numeric data). You can then select Contains from the Operation list. Each formatted string is searched for the value AUG.

Figure 4.13: Matching Text in a Formatted Variable

Matching Text in a Formatted Variable