- Syntax
- Overview
- Concepts
- Results
- Examples Producing a Complete Report of the DifferencesComparing Variables in Different Data SetsComparing a Variable Multiple TimesComparing Variables That Are in the Same Data SetComparing Observations with an ID VariableComparing Values of Observations Using an Output Data Set (OUT=)Creating an Output Data Set of Statistics (OUTSTATS=)

PROC COMPARE stores a return code
in the automatic macro variable SYSINFO. The value of the return
code provides information about the result of the comparison. By checking
the value of SYSINFO after PROC COMPARE has run and before any other
step begins, SAS macros can use the results of a PROC COMPARE step
to determine what action to take or what parts of a SAS program to
execute.

The following table
is a key for interpreting the SYSINFO return code from PROC COMPARE.
For each of the conditions listed, the associated value is added to
the return code if the condition is true. Thus, the SYSINFO return
code is the sum of the codes listed in the following table for the
applicable conditions:

These codes are ordered
and scaled to enable a simple check of the degree to which the data
sets differ. For example, if you want to check that two data sets
contain the same variables, observations, and values, but you do not
care about differences in labels, formats, and so on, then use the
following statements:

proc compare base=SAS-data-set compare=SAS-data-set; run; %if &sysinfo >= 64 %then %do; handle error; %end;

You can examine individual
bits in the SYSINFO value by using DATA step bit-testing features
to check for specific conditions. For example, to check for the presence
of observations in the base data set that are not in the comparison
data set, use the following statements:

proc compare base=SAS-data-set compare=SAS-data-set; run; %let rc=&sysinfo; data _null_; if &rc='1......'b then put 'Observations in Base but not in Comparison Data Set'; run;

The
following sections show and describe the default output of the two
data sets shown in Overview: COMPARE Procedure. Because PROC
COMPARE produces lengthy output, the output is presented in seven
pieces.

options nodate pageno=1 linesize=80 pagesize=60; proc compare base=proclib.one compare=proclib.two; run;

This report lists the attributes
of the data sets that are being compared. These attributes include
the following:

The view the Data Set
Summary, see Partial Output Showing the Data Set Summary and Variables Summary.

This report compares the variables
in the two data sets. The first part of the report lists the following:

This report provides information
about observations in the base and comparison data sets. First of
all, the report identifies the first and last observation in each
data set, the first and last matching observations, and the first
and last different observations. Then, the report lists the following:

This report consists of
a table for each pair of matching variables judged unequal at one
or more observations. When comparing character values, PROC COMPARE
displays only the first 20 characters. When you use the TRANSPOSE
option, it displays only the first 12 characters. Each table shows
the following:

If you use the STATS, ALLSTATS,
or PRINTALL option, then the Value Comparison Results for Variables
section contains summary statistics for the numeric variables that
are being compared. The STATS option generates these statistics for
only the numeric variables whose values are judged unequal. The ALLSTATS
and PRINTALL options generate these statistics for all numeric variables,
even if all values are judged equal.

Note: In all cases PROC COMPARE
calculates the summary statistics based on all matching observations
that do not contain missing values, not just on those containing unequal
values.

The following output
shows the following summary statistics for base data set values, comparison
data set values, differences, and percent differences:

the number of matching
observations judged unequal, and the percent of the matching observations
that were judged unequal.

the difference between
the mean of the base values and the mean of the comparison values.
This line contains three numbers. The first is the mean expressed
as a percentage of the base values mean. The second is the mean expressed
as a percentage of the comparison values mean. The third is the difference
in the two means (the comparison mean minus the base mean).

The TRANSPOSE option prints the comparison
results by observation instead of by variable. The comparison results
precede the observation summary report. By default, the source of
the values for each row of the table is indicated by the following
label:

_OBS_1=number-1 _OBS_2=number-2where number-1 is the number of the observation in the base data set for which the value of the variable is shown, and number-2 is the number of the observation in the comparison data set.

The
following output shows the differences in PROCLIB.ONE and PROCLIB.TWO
by observation instead of by variable.

options nodate pageno=1 linesize=80 pagesize=60; proc compare base=proclib.one compare=proclib.two transpose; title 'Comparing Two Data Sets: Default Report'; run;

The COMPARE procedure assigns a name to each table that
it creates. You can use these names to reference the table when using
the Output Delivery System (ODS) to select tables and create output
data sets. For more information, see SAS Output Delivery System: User's Guide.

By default, the OUT= data set contains an observation
for each pair of matching observations. The OUT= data set contains
the following variables from the data sets you are comparing:

In addition, the data
set contains two variables created by PROC COMPARE to identify the
source of the values for the matching variables: _TYPE_ and _OBS_.

is a character variable
of length 8. Its value indicates the source of the values for the
matching (or VAR) variables in that observation. (For ID and BY variables,
which are not compared, the values are the values from the original
data sets.) _TYPE_ has the label

`Type of Observation`

.
The four possible values of this variable are as follows:
the values in this
observation are from an observation in the base data set. PROC COMPARE
writes this type of observation to the OUT= data set when you specify
the OUTBASE option.

the values in this
observation are from an observation in the comparison data set. PROC
COMPARE writes this type of observation to the OUT= data set when
you specify the OUTCOMP option.

the values in this
observation are the differences between the values in the base and
comparison data sets. For character variables, PROC COMPARE uses
a period (.) to represent equal characters and an X to represent unequal
characters. PROC COMPARE writes this type of observation to the OUT=
data set by default. However, if you request any other type of observation
with the OUTBASE, OUTCOMP, or OUTPERCENT option, then you must specify
the OUTDIF option to generate observations of this type in the OUT=
data set.

is a numeric variable
that contains a number further identifying the source of the OUT=
observations.

For observations with
_TYPE_ equal to BASE, _OBS_ is the number of the observation in the
base data set from which the values of the VAR variables were copied.
Similarly, for observations with _TYPE_ equal to COMPARE, _OBS_ is
the number of the observation in the comparison data set from which
the values of the VAR variables were copied.

The COMPARE procedure
takes variable names and attributes for the OUT= data set from the
base data set except for the lengths of ID and VAR variables, for
which it uses the longer length regardless of which data set that
length is from. This behavior has two important repercussions:
For an example of the OUT= option,
see Comparing Values of Observations Using an Output Data Set (OUT=).

When you use the OUTSTATS= option, PROC COMPARE calculates the same
summary statistics as the ALLSTATS option for each pair of numeric
variables compared (see Table of Summary Statistics). The OUTSTATS= data set contains
an observation for each summary statistic for each pair of variables.
The data set also contains the BY variables used in the comparison
and several variables created by PROC COMPARE:

is a character variable
that contains the name of the variable from the base data set for
which the statistic in the observation was calculated.

is a character variable
that contains the name of the variable from the comparison data set
for which the statistic in the observation was calculated. The _WITH_
variable is not included in the OUTSTATS= data set unless you use
the WITH statement.

is a character variable
that contains the name of the statistic contained in the observation.
Values of the _TYPE_ variable are

`N`

, `MEAN`

, `STD`

, `MIN`

, `MAX`

, `STDERR`

, `T`

, `PROBT`

, `NDIF`

, `DIFMEANS`

,
and `R`

, `RSQ`

.
is a numeric variable
that contains the value of the statistic calculated from the values
of the variable named by _VAR_ in the observations in the base data
set with matching observations in the comparison data set.

is a numeric variable
that contains the value of the statistic calculated from the values
of the variable named by the _VAR_ variable (or by the _WITH_ variable
if you use the WITH statement) in the observations in the comparison
data set with matching observations in the base data set.

Note: For both types of output
data sets, PROC COMPARE assigns one of the following data set labels:

Comparison of base-SAS-data-set with comparison-SAS-data-set Comparison of variables in base-SAS-data-set

See
Creating an Output Data Set of Statistics (OUTSTATS=) for an example of an OUTSTATS= data set.