- Syntax
- Overview
- Concepts
- Results
- Examples Producing a Complete Report of the DifferencesComparing Variables in Different Data SetsComparing a Variable Multiple TimesComparing Variables That Are in the Same Data SetComparing Observations with an ID VariableComparing Values of Observations Using an Output Data Set (OUT=)Creating an Output Data Set of Statistics (OUTSTATS=)

The following figure shows two data
sets. The data inside the shaded boxes shows the part of the data
sets that the procedure compares. Assume that variables with the same
names have the same type.

When you use PROC COMPARE
to compare data set TWO with data set ONE, the procedure compares
the first observation in data set ONE with the first observation in
data set TWO, and it compares the second observation in the first
data set with the second observation in the second data set, and so
on. In each observation that it compares, the procedure compares the
values of the IDNUM, NAME, GENDER, and GPA.

In a simple comparison, PROC COMPARE
uses the observation number to determine which observations to compare.
When you use an ID variable, PROC COMPARE uses the values of the ID
variable to determine which observations to compare. ID variables
should have unique values and must have the same type.

For the two data sets
shown in the following figure, assume that IDNUM is an ID variable
and that IDNUM has the same type in both data sets. The procedure
compares the observations that have the same value for IDNUM. The
data inside the shaded boxes shows the part of the data sets that
the procedure compares.

The data sets contain
three matching variables: NAME, GENDER, and GPA. They also contain
five matching observations: the observations with values of

`2998`

, `9866`

, `2118`

, `3847`

,
and `2342`

for IDNUM.
Data Set TWO contains
two observations (IDNUM=

`7565`

and IDNUM=`1755`

)
for which data set ONE contains no matching observations. Similarly,
no variable in data set ONE matches the variable YEAR in data set
TWO.
See
Comparing Observations with an ID Variable for an example that uses an ID variable.

The COMPARE procedure judges
numeric values unequal if the magnitude of their difference, as measured
according to the METHOD= option, is greater than the value of the
CRITERION= option. PROC COMPARE provides four methods for applying
CRITERION=:

For a numeric variable
compared, let x be its value
in the base data set and let y
be its value in the comparison data set. If both x and y are
nonmissing, then the values are judged unequal according to the value
of METHOD= and the value of CRITERION= (γ) as follows:

If x or y is
missing, then the comparison depends on the NOMISSING option. If the
NOMISSING option is in effect, then a missing value will always be
judged equal to anything. Otherwise, a missing value is judged equal
only to a missing value of the same type (that is, .=., .^=.A, .A=.A,
.A^=.B, and so on).

If the value that is
specified for CRITERION= is negative, then the actual criterion that
is used, γ, is equal to the absolute value of the specified
criterion multiplied by a very small number, ε (epsilon), that
depends on the numerical precision of the computer. This number ε
is defined as the smallest positive floating-point value such that,
using machine arithmetic, 1−ε<1<1+ε. Round-off
or truncation error in floating-point computations is typically a
few orders of magnitude larger than ε. CRITERION=−1000
often provides a reasonable test of the equality of computed results
at the machine level of precision.

The value δ added
to the denominator in the RELATIVE method is specified in parentheses
after the method name: METHOD=RELATIVE(δ). If not specified
in METHOD=, then δ defaults to 0. The value of δ can
be used to control the behavior of the error measure when both x and y are
very close to 0. If δ is not given and x and y are
very close to 0, then any error produces a large relative error (in
the limit, 2).

Specifying a value for
δ avoids this extreme sensitivity of the RELATIVE method for
small values. If you specify METHOD=RELATIVE(δ) CRITERION=γ
when both x and y are
much smaller than δ in absolute value, then the comparison is
as if you had specified METHOD=ABSOLUTE CRITERION=δγ.
However, when either x or y is
much larger than δ in absolute value, the comparison is like
METHOD=RELATIVE CRITERION=γ. For moderate values of x and y,
METHOD=RELATIVE(δ) CRITERION=γ is, in effect, a compromise
between METHOD=ABSOLUTE CRITERION=δ γ and METHOD=RELATIVE
CRITERION=γ.

For character variables,
if one value is longer than the other, then the shorter value is padded
with blanks for the comparison. Nonblank character values are judged
equal only if they agree at each character. If the NOMISSING option
is in effect, then blank character values are judged equal to anything.

In the reports
of value comparisons and in the OUT= data set, PROC COMPARE displays
difference and percent difference values for the numbers compared.
These quantities are defined using the value from the base data set
as the reference value. For a numeric variable compared, let x be
its value in the base data set and let y be
its value in the comparison data set. If x and y are
both nonmissing, then the difference and percent difference are defined
as follows: