RANK Procedure

PROC RANK Statement

Computes the ranks for one or more numeric variables.
Restriction: Only on ranking method can be specified in a single PROC RANK step.
Ranking Values of Multiple Variables

Ranking Values within BY Groups

Partitioning Observations into Groups Based on Ranks

Syntax

PROC RANK <option(s)>;

Summary of Optional Arguments

Compute fractional ranks
computes fractional ranks by dividing each rank by the denominator n+1.
Create an output data set
names the output data set.
Preserve values
preserves raw values of all BY variables.
Reverse the order of the rankings
reverses the direction of the ranks.
Specify how to rank tied values
specifies how to compute normal scores or ranks for tied data values.
Specify the input data set
specifies the input SAS data set.
Specify the ranking method
computes fractional ranks by dividing each rank by the number of observations having nonmissing values of the ranking variable.
assigns group values ranging from 0 to number-of-groups minus 1.
computes normal scores from the ranks.
calculates the percentage of observations with nonmissing values in the rank.
computes Savage (or exponential) scores from the ranks.

Optional Arguments

DATA=SAS-data-set
specifies the input SAS data set.
Restrictions:You cannot use PROC RANK with an engine that supports concurrent access if another user is updating the data set at the same time.

For in-database processing to occur, it is necessary that the data set specification refer to a table residing on a supported DBMS.

DESCENDING
reverses the direction of the ranks. With DESCENDING, the largest value receives a rank of 1, the next largest value receives a rank of 2, and so on. Otherwise, values are ranked from smallest to largest.
FRACTION
computes fractional ranks by dividing each rank by the number of observations having nonmissing values of the ranking variable.
Alias:F
Interaction:TIES=HIGH is the default with the FRACTION option. With TIES=HIGH, fractional ranks are considered values of a right-continuous, empirical cumulative distribution function.
See:NPLUS1 option
GROUPS=number-of-groups
assigns group values ranging from 0 to number-of-groups minus 1. Common specifications are GROUPS=100 for percentiles, GROUPS=10 for deciles, and GROUPS=4 for quartiles. For example, GROUPS=4 partitions the original values into four groups, with the smallest values receiving, by default, a quartile value of 0 and the largest values receiving a quartile value of 3.
The formula for calculating group values is as follows:
FLOOR is the FLOOR function, rank is the value's order rank, k is the value of GROUPS=, and n is the number of observations having nonmissing values of the ranking variable for TIES=LOW, TIES=MEAN, and TIES=HIGH. For TIES=DENSE, n is the number of observations that have unique nonmissing values.
If the number of observations is evenly divisible by the number of groups, each group has the same number of observations, provided there are no tied values at the boundaries of the groups. Grouping observations by a variable that has many tied values can result in unbalanced groups because PROC RANK always assigns observations with the same value to the same group.
Tip:Use DESCENDING to reverse the order of the group values.
NORMAL=BLOM | TUKEY | VW
computes normal scores from the ranks. The resulting variables appear normally distributed. n is the number of observations that have nonmissing values of the ranking variable for TIES=LOW, TIES=MEAN, and TIES=HIGH. For TIES=DENSE, n is the number of observations that have unique nonmissing values. The formulas are as follows:
BLOM
yi−1((ri−3/8)/(n+1/4))
TUKEY
yi−1((ri−1/3)/(n+1/3))
VW
yi−1((ri)/(n+1))
In these formulas, Φ−1 is the inverse cumulative normal (PROBIT) function, ri is the rank of the ith observation, and n is the number of nonmissing observations for the ranking variable.
VW stands for van der Waerden. With NORMAL=VW, you can use the scores for a nonparametric location test. All three normal scores are approximations to the exact expected order statistics for the normal distribution (also called normal scores). The BLOM version appears to fit slightly better than the others (Blom 1958; Tukey 1962).
Restriction:Use of the NORMAL= option will prevent in-database processing.
Interaction:If you specify the TIES= option, then PROC RANK computes the normal score from the ranks based on non-tied values and applies the TIES= specification to the resulting score.
NPLUS1
computes fractional ranks by dividing each rank by the denominator n+1, where n is the number of observations that have nonmissing values of the ranking variable for TIES=LOW, TIES=MEAN, and TIES=HIGH. For TIES=DENSE, n is the number of observations that have unique nonmissing values.
Alias:FN1, N1
Interaction:TIES=HIGH is the default with the NPLUS1 option.
See:FRACTION option
OUT=SAS-data-set
names the output data set. If SAS-data-set does not exist, PROC RANK creates it. If you omit OUT=, the data set is named using the DATAn naming convention.
Interaction:When in-database processing is being performed and OUT= also refers to a supported DBMS table, and if both IN= and OUT= reference the same library, then all processing can occur on the DBMS with results directly populating the output table. In this case, no results will be returned to SAS.
PRESERVERAWBYVALUES
preserves raw values of all BY variables. when those variables are propagated to the output data set. If the PRESERVERAWBYVALUES option is not specified, and one BY variable is specified, then a representative value for each BY group is written to the output data set. If multiple BY variables are specified, then a representative set of values for each BY group is written to the output data set.
PERCENT
divides each rank by the number of observations that have nonmissing values of the variable and multiplies the result by 100 to get a percentage. n is the number of observations that have nonmissing values of the ranking variable for TIES=LOW, TIES=MEAN, and TIES=HIGH. For TIES=DENSE, n is the number of observations that have unique nonmissing values.
Alias:P
Interaction:TIES=HIGH is the default with the PERCENT option.
Tip:You can use PERCENT to calculate cumulative percentages, but you use GROUPS=100 to compute percentiles.
SAVAGE
computes Savage (or exponential) scores from the ranks by the following formula (Lehman 1998):
Interaction:If you specify the TIES= option, then PROC RANK computes the Savage score from the ranks based on non-tied values and applies the TIES= specification to the resulting score.
TIES=HIGH | LOW | MEAN | DENSE
specifies how to compute normal scores or ranks for tied data values.
HIGH
assigns the largest of the corresponding ranks (or largest of the normal scores when NORMAL= is specified).
LOW
assigns the smallest of the corresponding ranks (or smallest of the normal scores when NORMAL= is specified).
MEAN
assigns the mean of the corresponding rank (or mean of the normal scores when NORMAL= is specified).
DENSE
computes scores and ranks by treating tied values as a single-order statistic. For the default method, ranks are consecutive integers that begin with the number one and end with the number of unique, nonmissing values of the variable that is being ranked. Tied values are assigned the same rank.
Note: CONDENSE is an alias for DENSE.
Default:MEAN (unless the FRACTION option or PERCENT option is in effect).
Interaction:If you specify the NORMAL= option, then the TIES= specification applies to the normal score, not to the rank that is used to compute the normal score.

Ranking Values of Multiple Variables

Ranking Values within BY Groups