K. Gwet's Inter-Rater Reliability Blog : Some R functions for calculating chance-corrected agreement coefficientsInter-rater reliability: Cohen kappa, Gwet AC1/AC2, Krippendorff Alpha

Monday, March 31, 2014

Some R functions for calculating chance-corrected agreement coefficients

Several researchers have shown interest in having R functions that can compute several chance-corrected agreement coefficients, their standard errors, confidence interval, and p-values as described in my book Handbook and Inter-Rater Reliability (3rd ed.). I have finally found the time to write these R functions, which can be downloaded from this r-functions page of the agreestat website.

All these R functions can handle missing values without problems, and cover several types of agreement coefficients including Gwet AC1/AC2 (2008, 2012), Kappa coefficients of Cohen (1960), Fleiss (1971), Conger (1980), Brennan & Prediger (1981), Krippendorff (1970), and the percent agreement.

Bibliography:

[1] Brennan, R. L., and Prediger, D. J. (1981). "Coefficient Kappa: some uses, misuses, and alternatives." Educational and Psychological Measurement, 41, 687-699.
[2] Cohen, J. (1960). "A coefficient of agreement for nominal scales." Educational and Psychological Measurement, 20, 37-46.
[3] Conger, A. J. (1980), "Integration and Generalization of Kappas for Multiple Raters," Psychological Bulletin, 88, 322-328.
[4] Fleiss, J. L. (1971). "Measuring nominal scale agreement among many raters", Psychological Bulletin, 76, 378-382
[5] Gwet, K. L. (2008). "Computing inter-rater reliability and its variance in the presence of high agreement." British Journal of Mathematical and Statistical Psychology, 61, 29-48.
[6] Gwet, K.L. (2012). Handbook of Inter-Rater Reliability (3rd Ed.), Advanced Analytics, LLC, Maryland, USA
[7] Krippendorff, K. (1970). "Estimating the reliability, systematic error, and random error of interval data," Educational and Psychological Measurement, 30, 61-70

28 comments:

AnonymousSeptember 18, 2014 at 9:16 AM
Dear Kilem,

First I would like to express my gratitude for preparing R functions for computation of AC1 coefficient. It really saved our project. I would however like to ask you if there is also an R function for computation of AC2 coefficient? I found only the one for AC1 on your web site.

many thanks for help & best, Gregor
ReplyDelete
Replies
AnonymousSeptember 18, 2014 at 2:09 PM
Ah, I just discovered weight matrix... tnx anyway :)
ReplyDelete
Replies
AnonymousOctober 15, 2015 at 12:52 PM
Hello, Kilem!

My name is Gustavo Arruda. I'm from Brazil.

Congratulations for your contributions in the field of statistics.

I would like to know if the modified Kappa (Brennan-Prediger 1981) can be used for a test-retest analysis (intra-rater) for an ordinal variable with three categories - using simple ordinal weight?
Once the modified Kappa was originally developed for nominal variables but can be realized with a larger number of categories (without distinction) in the Agreestat.
Thank you!

ReplyDelete
Replies
Kilem L. GwetOctober 16, 2015 at 8:18 AM
Hi Gustavo,
Although the Brennan-Prediger is most often used for computing inter-rater reliability, there is nothing that would prevent you from using it to compute intra-rater reliability. The raters in this case, would represent ratings of the same subjects on different occasions. Other than that everything else remain the same. If your ratings are ordinal then you certainly should use ordinal weights to account for the partial agreements that some disagreements represent.

Thanks

Kilem
ReplyDelete
Replies
UnknownOctober 16, 2015 at 10:25 AM
Thank you for your attention!
And congratulations for your contributions!
ReplyDelete
Replies
AnonymousJuly 18, 2016 at 11:54 AM
Hello Kilem, A question
if we have multiple judges to calculate the rating sustentantes, but there may be a judge who called only 7 times and others who scored more than 100, I know it would be expected to have similar amounts and read in one of your articles that it is not possible to interpret same. What would be your recommendation? And we perform the analysis AC1
ReplyDelete
Replies
UnknownAugust 10, 2016 at 7:14 PM
Dr. Gwet,

I have your 3rd & 4th editions. Thank you very much! Also, thank you very much for creating the R code too!

I have a 37 (item) x 16 rater matrix of ratings (1=essential, 2=useful, 3=not necessary) with some missing values. As I understand it, the appropriate measure of interrupter reliability would be the AC2. When I copy the R function from your website (run it) and then try to compute the AC2 I get the following error:

> gwet.ac1.raw(interrater.reliability)
Error in gwet.ac1.raw(interrater.reliability) :
could not find function "identity.weights"

> gwet.ac1.raw(interrater.reliability,weights="unweighted")
Error in gwet.ac1.raw(interrater.reliability, weights = "unweighted") :
could not find function "identity.weights"

I am not sure what I am doing wrong, could you give me some insight as to what I am doing wrong?

Thank you very much!

-Greg
ReplyDelete
Replies
Kilem L. Gwet, Ph.D.August 11, 2016 at 8:02 AM
Hi Greg,
Download the file http://www.agreestat.com/software/r/new/weights.gen.r, then load it in R using source("C:\\Your_Directory\\weights.gen.r"). Now you can use any function you want.
ReplyDelete
Replies
AnonymousFebruary 2, 2017 at 5:32 AM
Dear Kilem.

I've used the functions provided in agree.coeff2.r (http://www.agreestat.com/software/r/new/agree.coeff2.r) to asses the agreement of two different questionnaires (without any gold standard) that classify a person in positive or negative for one condition according to its own score. Each participant is assessed once with each test in the same visit.

With Gwet's AC1 there's a lack of agreement between the questionnaires, which provides a negative coefficient (which I assume is part of the negative biass that you point out in reference 5), but what really surprise me is a p value largely over 1.

This lead me to a dilemma: I don't know if there's some error in the function or how can be a p over one and which is the interpretation of a p over one. Could you shed some light about this issue?

Thanks

Xavier

Here are the numbers.

A+B+ = 68
A+B- = 2
A-B+ = 267
A-B- = 146

Gwet's AC1/AC2 Coefficient
==========================
Percent agreement: 0.4430642 Percent chance agreement: 0.4869604
AC1/AC2 coefficient: -0.08556103 Standard error: 0.04785226
95 % Confidence Interval: ( -0.1795858 , 0.008463783 )
P-value: 1.9256

ReplyDelete
Replies
Kilem L. Gwet, Ph.D.February 8, 2017 at 3:45 AM
I have updated the r function agree.coeff2.r. It now produces the correct p-value even when the agreement coefficient is negative.

Thanks
ReplyDelete
Replies
UnknownApril 28, 2017 at 8:18 AM
Hi Dr. Gwet,

I am using your gwet.ac1.raw function in agree.coeff3.raw.r. When I use a dataset which contains some missing values (NA), the function returns NaN for the AC1 coefficient.

Example:
> testAC
[,1] [,2] [,3]
[1,] NA NA NA
[2,] 1 NA NA
[3,] NA NA NA
[4,] 1 1 1
[5,] 1 1 1
[6,] 1 1 1
[7,] 1 1 1
[8,] 1 2 1
[9,] 1 2 1
[10,] 1 1 1
[11,] NA NA NA
> gwet.ac1.raw(testAC)
Gwet's AC1 Coefficient
======================
Percent agreement: 0.8095238 Percent chance agreement: NaN
AC1 coefficient: NaN Standard error: NaN
95 % Confidence Interval: ( NaN , NaN )
P-value: NaN

Hoping you can help. My understanding is the code should work for missing values. It appears to work fine when there are no missing values.

Thanks,
Dharmesh
ReplyDelete
Replies
Kilem L. Gwet, Ph.D.April 28, 2017 at 8:56 AM
Each row must contain at least one non-missing value. Rows with only NA values must first be deleted from the dataset before executing the functions.
ReplyDelete
Replies
UnknownMay 1, 2017 at 8:19 AM
Thanks. That did solve the issue. However, I'm trying the following dataset which is causing issues for AC1 as well as for k-alpha:

Example:
> testNoNA
[,1] [,2] [,3]
[1,] 2 NA NA
[2,] 2 2 2
[3,] 2 2 2
[4,] 2 2 2
[5,] 2 2 2
[6,] 2 2 2
[7,] 2 2 2
[8,] 2 2 2
> gwet.ac1.raw(testNoNA)
Gwet's AC1 Coefficient
======================
Percent agreement: 1 Percent chance agreement: NaN
AC1 coefficient: NaN Standard error: NaN
95 % Confidence Interval: ( NaN , NaN )
P-value: NaN
> krippen.alpha.raw(testNoNA)
Error in (agree.mat * (agree.mat.w - 1)) %*% rep(1, q) :
non-conformable arguments

Thanks,
Dharmesh
ReplyDelete
Replies
UnknownMay 1, 2017 at 1:37 PM
This particular data set generates an error message mainly because it contains a single category, which is 2. A typical data table would normally show 2 categories or more. When you want to quantify the extent of agreement among raters, it is because 2 raters have the possibility of selecting 2 different categories.

I understand very well that despite the availability of 2 categories or more, 2 raters may well decide to assign all subjects into the exact same category (e.g. 2 as in your table). In this case AC1 equals 1 and its variance would be 0. Krippendorff alpha will likely be 0 (which is clearly inaccurate, and this coefficient is know not to work well in such a scenario), its variance will also be 0.
ReplyDelete
Replies
Isabella R. Ghement, Ph.D.March 15, 2018 at 7:19 PM
Hi Kilem,

I recently bought your book and find it very useful.

It is my understanding that we can compute intra-rater reliability from ordinal ratings produced by a single rater on two occasions for a number of subjects. We can achieve this by treating the ordinal ratings on the two occasions as coming from two independent raters and essentially computing measures of inter-rater agreement which reflect the ordinal nature of the ratings (e.g., AC1, generalized kappa).

What if we have n multiple raters, such that each provides ratings on the same subjects on two occasions? Is it fair to assume that the n x 2 sets of ordinal ratings come from n x 2 independent raters and then apply measures of inter-rater agreement to them? I haven't seen anything in the literature covering explicitely the case of more than 2 raters when it comes to dealing with ordinal ratings, so I wanted to make sure I'm on the right track.
ReplyDelete
Replies
Mahmoud Slim, PhDApril 5, 2018 at 10:50 AM
Dear Dr. Kilem,

Many thanks for your amazing contributions.

We are working on designing severity classification criteria. In the absence of a gold standard, we employed latent class analysis to classify our cases using the best fit model, then we used two classification systems (ordinal classes matching those generated by latent class analysis) to assess their agreement with the latent classes. Is AC1 the accurate agreement coefficient to be used in this context? And how can we assess whether the differences in agreement coefficients (Classification system 1/LCA agreement versus Classification system 2/LCA agreement) are statistically significant?

Thank you,

Mahmoud Slim
ReplyDelete
Replies
UnknownApril 5, 2018 at 11:16 AM
Dear Dr. Kilem,

Many thanks for your amazing contributions.

We are working on designing severity classification criteria based on an already validated and reliable scale. In the absence of a gold standard, we employed latent class analysis to classify our cases using the best fit model, then we used two different combinations of criteria on the same scale to classify cases severities (ordinal classes matching those generated by latent class analysis) and then we assessed the agreement of each set of criteria with the LCA classification.
Is AC1 the accurate agreement coefficient to be used in this context? And is there a way to assess whether the differences in agreement coefficients (Classification system 1/LCA agreement versus Classification system 2/LCA agreement) are statistically significant?

Thank you,

Mahmoud Slim
ReplyDelete
Replies
AnonymousMay 7, 2020 at 12:32 PM
Hi, I was using the irrCAC package from CRAN (v1.0), and I noticed that if there is perfect agreement, gwet.ac1.table returns an AC1 estimate of NaN while gwet.ac1.raw returns an AC1 estimate of 1. I believe the latter is correct. It looks like gwet.ac1.raw defines pe in these cases as (1 - 1e-15), whereas gwet.ac1.table does not have that check.
> library(irrCAC)
> ratdat<- matrix(rep(2,40),nrow=20,ncol=2)
> gwet.ac1.table(table(ratdat))$coeff.val
[1] NaN
> gwet.ac1.raw(ratdat)$est$coeff.val
[1] 1
ReplyDelete
Replies
Kilem L. Gwet, Ph.D.May 21, 2020 at 4:11 AM
Hi,
You are right, and thank you for bringing this issue to my attention. I will fix it in the package as soon as possible.
ReplyDelete
Replies
AnonymousMay 28, 2020 at 11:30 AM
Dr. Gwet,

I've been going through your 2016 paper on testing the difference in correlated agreement coefficients as well as the associated R code "paired t-test for agreement coefficients.r". I have a situation where I have three novice raters (group 1) and three expert raters (group 2). They are rating the same set of subjects and want to know if the methods discussed in the paper and the r function ttest.ac2 apply to testing if the two groups have the same AC1? To quote the paper: "The proposed methods are general and versatile, and can be used to analyze correlated coefficients between overlapping groups of raters, or between two rounds of ratings produced by the same group of raters on two occasions." This isn't quite the situation I have since there are no overlapping raters.

Thanks,
Matt
ReplyDelete
Replies
Kilem L. Gwet, Ph.D.June 10, 2020 at 4:07 PM
Hi Matt,
The methods discussed are quite general and can be applied to 2 groups of raters, whether they overlap or not. Traditional methods were only applicable to non-overlapping group. The problem was with groups that overlap. So, I extended these traditional methods to groups that overlap. But non-overlapping groups can still be used with these methods.

Thanks
ReplyDelete
Replies
UnknownNovember 23, 2020 at 6:35 AM
Dr Gwet,

I am analysing some reliability data using R. I can't work out the correct format to enter category labels (Not all of the labels are used in some ratings).

I want to specify the (ordinal) labels 1,2,3

This does not work: gwet.ac1.raw(Data, weights = "ordinal", categ.labels = 1,2,3).

Could you advise on the correct format? I've tried various different ways of formatting categ.labels

Thank you

Julie
ReplyDelete
Replies
AnonymousJanuary 29, 2022 at 12:51 PM
Coin Casino: Play Real Money Online in CA
Coin Casino is a licensed and safe way to play online casino games at online gambling sites in Canada and US with 메리트카지노 no download required. 메리트카지노 Learn 인카지노 more.
ReplyDelete
Replies

Add comment