COMPARE Procedure

ID Statement

Lists variables to use to match observations.
See: A Comparison with an ID Variable
Comparing Observations with an ID Variable

Syntax

Required Argument

variable
specifies the variable that the procedure uses to match observations. You can specify more than one variable, but the data set must be sorted by the variable or variables that you specify. These variables are ID variables. ID variables also identify observations on the printed reports and in the output data set.

Optional Arguments

DESCENDING
specifies that the data set is sorted in descending order by the variable that immediately follows the word DESCENDING in the ID statement.
If you use the DESCENDING option, then you must sort the data sets. SAS does not use an index to process an ID statement with the DESCENDING option. Further, the use of DESCENDING for ID variables must correspond to the use of the DESCENDING option in the BY statement in the PROC SORT step that was used to sort the data sets.
NOTSORTED
specifies that observations are not necessarily sorted in alphabetic or numeric order. The data are grouped in another way (for example, chronological order).

Details

Requirements for ID Variables

  • ID variables must be in the BASE= data set or PROC COMPARE stops processing.
  • If an ID variable is not in the COMPARE= data set, then PROC COMPARE writes a warning message to the SAS log and does not use that variable to match observations in the comparison data set (but does write it to the OUT= data set).
  • ID variables must be of the same type in both data sets.
  • You should sort both data sets by the common ID variables (within the BY variables, if any) unless you specify the NOTSORTED option.

Comparing Unsorted Data

If you do not want to sort the data set by the ID variables, then you can use the NOTSORTED option. When you specify the NOTSORTED option, or if the ID statement is omitted, PROC COMPARE matches the observations one-to-one. That is, PROC COMPARE matches the first observation in the base data set with the first observation in the comparison data set, the second with the second, and so on. If you use NOTSORTED, and the ID values of corresponding observations are not the same, then PROC COMPARE prints an error message and stops processing.
If the data sets are not sorted by the common ID variables and if you do not specify the NOTSORTED option, then PROC COMPARE writes a warning message to the SAS log and continues to process the data sets as if you had specified NOTSORTED.

Avoiding Duplicate ID Values

The observations in each data set should be uniquely labeled by the values of the ID variables. If PROC COMPARE finds two successive observations with the same ID values in a data set, then it does the following:
  • prints the warning Duplicate Observations for the first occurrence for that data set
  • prints the total number of duplicate observations found in the data set in the observation summary report
  • uses the duplicate observations in the base data set and the comparison data set to compare the observations on a one-to-one basis
When the data sets are not sorted, PROC COMPARE detects only those duplicate observations that occur in succession.