Contents: |
Purpose / History / Requirements / Usage / Details / Limitations / Missing Values / See Also / References |

**
NOTE: For assessing agreement between two raters, use the AGREE option in PROC FREQ.
**

*PURPOSE:*- Compute estimates and tests of agreement among multiple raters when responses (ratings) are on a nominal or ordinal scale. For a nominal or ordinal response, kappa statistics can be computed. For a numerically-coded, ordinal response, Kendall's coefficient of concordance can be computed.
*HISTORY:*-
*Version**Update Notes*1.3 Fixes errors causes by using the name COUNT or PERCENT for the ITEMS=, RATERS=, or RESPONSE= variable. 1.2 Corrected cases of perfect agreement. Require more than one subject/item and rater. Added check for new version, reset of _last_ data set and notes option at end. 1.1 Added error checking for parameters, data set, and variables. Added version parameter. *REQUIREMENTS:*- Base SAS and SAS/STAT Software are required.
*USAGE:*-
Follow the instructions in the Downloads tab of this
sample to save the %MAGREE macro definition. Replace the text within quotes in the following statement with the location of the %MAGREE macro definition file on your system. In your SAS program or in the SAS editor window, specify this statement to define the %MAGREE macro and make it available for use:
%inc "<location of your file containing the MAGREE macro>";

Following this statement, you may call the %MAGREE macro. See the Results tab for an example.

The following parameters are required:

**items=***variable*- Specifies the variable containing subject (item) identifiers. This variable may be numeric or character. There must be more than one subject.
**raters=***variable*- Specifies the variable containing rater identifiers. This variable may be numeric or character. The same rater values must be used for all subjects even if subjects are rated by different raters. Note that Kendall's statistic is only valid when all subjects are rated by the same raters. There must be more than one rater.
**response=***variable*- Specifies the variable containing the ratings. This variable may be numeric or character. If it is character, it is assumed to be nominal and only kappa statistics are computed. If it is numeric, either or both statistic can be computed, but note that Kendall's statistic is only valid for ordinal responses.

The following parameters are optional:

**data=***SAS-data-set*- Identifies the input data set to be analyzed. If not specified, the last-created data set is used. It should contain one observation per rating (that is, an observation contains one rating on one subject by one rater) with variables containing subject and rater identifiers and a variable containing the rating (response). The same set of rater values must be used for each subject, even if subjects are rated by different raters (which is only valid for kappa statistics).
**stat=kappa | kendall | both**- Specifies which statistic to calculate. Use stat=kappa to compute kappa statistics. Use stat=kendall to compute Kendall's coefficient of concordance. Use stat=both to compute both statistics. By default, stat=both.

The version of the %MAGREE macro that you are using is displayed in the SAS log when you specify

**version**(or any string) as the first argument. For example:%MAGREE(version, ...other options...)

*DETAILS:*-
The methodology implemented by the %MAGREE macro is presented in
Fleiss (2003), Fleiss et. al. (1979), and Kendall (1955). See these references for details.
As stated in Fleiss et. al. (1979) concerning the multiple-rater kappa statistic, the raters for a given subject "are not necessarily the same
as those rating another."
Unlike the AGREE option in PROC FREQ, weighted kappa and confidence intervals are not provided by this macro.

Note that the RATERS= variable must use the same values to identify the raters in all subjects even if different raters are used to rate the subjects, which is only valid for the kappa statistic. Both statistics require that the number of raters be the same across all subjects.

The Kendall statistic is a nonparametric statistic and tests agreement of the raters' rankings of the subjects/items. Ties are only a concern with Kendall's statistic and are handled by the macro. Tests provided for the kappa statistics and Kendall's statistic are asymptotic (large sample) tests.

Landis and Koch (1977) attempted to indicate the degree of agreement that exists when kappa is found to be in various ranges:

Kappa value Degree of agreement <=0 Poor 0 - 0.2 Slight 0.2 - 0.4 Fair 0.4 - 0.6 Moderate 0.6 - 0.8 Substantial 0.8 - 1 Almost perfect Values of all statistics and their p-values are available in data sets after execution of the macro. The kappa estimates and related statistics are in the data set _KAPPAS. Kendall's coefficient and related statistic are in the data set _W.

**Updates**The %MAGREE macro attempts to check for a later version of itself. If it is unable to do this (such as if there is no active internet connection available), the macro will issue the following message:

MAGREE: Unable to check for newer version

The computations performed by the macro are not affected by the appearance of this message.

*LIMITATIONS:*- There must be more than one rater and more than one subject. The set of values in the RATERS= variable must be the same for all subjects, even if different raters are used for different subjects (which is only valid for the kappa statistic). See MISSING VALUES below. Kendall's statistic requires that the response variable be numeric and ordinal.
*MISSING VALUES:*-
Observations with missing values in the RESPONSE= variable are omitted before the analysis.
The following error message is issued if it is determined that each rater does
not rate each subject exactly once.
ERROR: Each rater must rate each subject exactly once.

The macro detects this error under any of the following conditions:

- Different rater values are found for different subjects
- Missing values appear in the subject, rater, or response variables
- A rater does not rate some subject(s)
- A subject is not rated by some rater(s)
- All subjects do not have an equal number of ratings
- More than one rating of a subject by a rater

*SEE ALSO:*-
The FREQ procedure in Base SAS software can compute simple and
weighted kappa statistics for comparing two raters. Use the AGREE
option on the TABLES statement.
The intraclass correlation coefficient can be used to test rater agreement (or reliability) on continuous responses. See the %INTRACC macro.

*REFERENCES:*-
Fleiss, J.L. (2003),
*Statistical Methods for Rates and Proportions, Third Edition*. New York: John Wiley & Sons, Inc.Fleiss, J.L., Nee, J.C.M, and Landis, J.R. (1979), "Large Sample Variance of Kappa in the Case of Different Sets of Raters,"

*Psychological Bulletin*, 86(5), 974-977.Kendall, M.G. (1955),

*Rank Correlation Methods, Second Edition*, London: Charles Griffin & Co. Ltd.Landis, J.R. and Koch G.G. (1977), "The measurement of observer agreement for categorical data,"

*Biometrics*, 33, 159-174.

These sample files and code examples are provided by SAS Institute Inc. "as is" without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and fitness for a particular purpose. Recipients acknowledge and agree that SAS Institute shall not be liable for any damages whatsoever arising out of their use of this material. In addition, SAS Institute will provide no support for the materials contained herein.

These sample files and code examples are provided by SAS Institute Inc. "as is" without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and fitness for a particular purpose. Recipients acknowledge and agree that SAS Institute shall not be liable for any damages whatsoever arising out of their use of this material. In addition, SAS Institute will provide no support for the materials contained herein.

*EXAMPLE:*-
This example is from Fleiss (1981, Table 13.8). Ten subjects
(S) are each rated into one of three categories (Y) by each of five
raters (R).
title "Analysis of data from Fleiss (1981)"; data a; do s=1 to 10; do r=1 to 5; input y @@; output; end; end; cards; 1 2 2 2 2 1 1 3 3 3 3 3 3 3 3 1 1 1 1 3 1 1 1 3 3 1 2 2 2 2 1 1 1 1 1 2 2 2 2 3 1 3 3 3 3 1 1 1 3 3 ; proc print noobs; title2 'Proper form of input data set'; run; proc freq; tables s*y; title2 'Summary table as shown in Fleiss, page 230'; run; /* Define the MAGREE macro */ %inc "<location of your file containing the MAGREE macro>"; %magree(data=a, items=s, raters=r, response=y)

*RESULTS:*-
Following is selected output from the EXAMPLE presented below. The
first ten observations corresponding to the first two subjects are
shown to illustrate the proper form of the input data set.
Analysis of data from Fleiss (1981) Proper form of input data set S R Y 1 1 1 1 2 2 1 3 2 1 4 2 1 5 2 2 1 1 2 2 1 2 3 3 2 4 3 2 5 3

If the response variable is numeric, as it is in this example, and if it is also ordinal, then both kappa and Kendall's statistics are valid. If this response were only nominal, then Kendall's statistic should be ignored or not computed by specifying the stat=kappa parameter. Both the overall kappa statistic and Kendall's coefficient of concordance are highly significant indicating stronger agreement than can be expected by chance. The kappa statistics for the individual response categories indicate the degree of agreement on each category. In this example, agreement is strongest on category 2.

Analysis of data from Fleiss (1981) The MAGREE macro Kappa statistics for nominal response Standard Y Kappa Error Z Prob>Z 1 0.29167 0.10000 2.91667 0.0018 2 0.67105 0.10000 6.71053 <.0001 3 0.34896 0.10000 3.48958 0.0002 Overall 0.41789 0.07165 5.83220 <.0001 Analysis of data from Fleiss (1981) The MAGREE macro Kendall's Coefficient of Concordance for ordinal response Coeff of Num Denom Concordance F DF DF Prob>F 0.49058 3.85214 8.6 34.4 0.0021

Right-click on the link below and select **Save** to save
the %MAGREE macro definition
to a file. It is recommended that you name the file
`magree.sas`

.

Compute estimates and tests of agreement among multiple raters when
responses (ratings) are on a nominal or ordinal scale. For a nominal
or ordinal response, kappa statistics can be computed. For a
numerically-coded, ordinal response, Kendall's coefficient of
concordance can be computed.

#### Operating System and Release Information

Type: | Sample |

Topic: | SAS Reference ==> Procedures ==> FREQ Analytics ==> Nonparametric Analysis Analytics ==> Longitudinal Analysis Analytics ==> Descriptive Statistics Analytics ==> Exact Methods Analytics ==> Categorical Data Analysis |

Date Modified: | 2011-01-17 15:21:14 |

Date Created: | 2005-01-13 15:03:24 |

Product Family | Product | Host | SAS Release | |

Starting | Ending | |||

SAS System | SAS/STAT | All | n/a | n/a |