When PROC RANK ranks values, if two
or more values of an analysis variable that are within a BY group
are equal, then tied values are present in the data. Because the values
are indistinguishable and there is usually no further obvious information
about which the ranks can reasonably be based, PROC RANK does not
assign different ranks to the values. Tied values could be arbitrarily
assigned different ranks. But in statistical applications such as
nonparametric statistical tests using ranks, it is conventional to
assign the same rank to tied values.
These statistical tests
commonly assume that the data is from a continuous distribution, in
which the probability of a tie is theoretically zero. In practice,
whether because of inaccuracies in measurement, the finite accuracy
of representation within a digital computer, or other reasons, tied
values often occur. It is also conventional in these statistical
tests to assign the average rank to a group of tied values. Assignment
of the average rank is preferred because it preserves the sum of the
ranks and, therefore, does not distort the estimate of the cumulative
distribution function.
For applications within
and outside of statistics, the RANK procedure provides the TIES= option
to control the treatment of tied values. The default value for this
option depends on the specified ranking or scoring method, which you
can specify with the options of the PROC RANK statement. For ranking
and scoring methods, when TIES=LOW, TIES=HIGH, or TIES=MEAN, tied
values are initially treated as if they are distinguishable. These
methods all begin by sorting the values of the analysis variable within
a BY group, and then assigning to each nonmissing value an ordinal
number that indicates its position in the sequence.
Subsequently, for non-scoring
methods, PROC RANK resolves tied values by selecting the minimum with
TIES=LOW, selecting the maximum with TIES=HIGH, or calculating the
average of the ordinals in a group of tied values with TIES=MEAN.
PROC RANK then obtains the rank from this value through one or more
further transformations such as scaling, translation, and truncation.
Scoring methods include
normal and Savage scoring, which are requested by the NORMAL= and
SAVAGE options. Non-scoring methods include ordinal ranking, the default,
and those methods that are requested by the FRACTION, NPLUS1, GROUPS=,
and PERCENT options. For the scoring methods NORMAL= and SAVAGE, PROC
RANK obtains the probability quantile scores with the appropriate
formulas as if no tied values were present within the data. PROC RANK
then resolves tied values by selecting the minimum, selecting the
maximum, or calculating the average of all scores within a tied group.
For all ranking and
scoring methods, when TIES=DENSE, tied values are treated as indistinguishable,
and each value within a tied group is assigned the same ordinal.
As with the other TIES= resolution methods, all ranking and scoring
methods begin by sorting the values of the analysis variable and
then assigning ordinals. However, a group of tied values is treated
as a single value. The ordinal assigned to the group differs by only
+1 from the ordinal that is assigned to the value just prior to the
group, if there is one. The ordinal differs by only -1 from the ordinal
assigned to the value just after the group, if there is one. Therefore,
the smallest ordinal within a BY group is 1, and the largest ordinal
is the number of unique, nonmissing values in the BY group.
After the ordinals are
assigned, PROC RANK calculates ranks and scores using the number of
unique, nonmissing values instead of the number of nonmissing values
for scaling. Because of its tendency to distort the cumulative distribution
function estimate, dense ranking is not generally acceptable for use
in nonparametric statistical tests.
Note
that PROC RANK bases its computations on the internal numeric values
of the analysis variables. The procedure does not format or round
these values before analysis. When values differ in their internal
representation, even slightly, PROC RANK does not treat them as tied
values. If this is a concern for your data, then round the analysis
variables by an appropriate amount before invoking PROC RANK. For
information about the ROUND function, see
ROUND Function in SAS Functions and CALL Routines: Reference..