Monday, February 22, 2021

Testing the Difference Between 2 Agreement Coefficients for Statistical Significance

 Researchers who use chance-corrected agreement coefficients such as Cohen's Kappa, Gwet's AC1 or AC2, Fleiss' Kappa and many other alternatives in their research, often need to compare two coefficients calculated with 2 different sets of ratings.  A rigorous way to do such a comparison is to evaluate the difference between these 2 coefficients for statistical significance. This issue was extensively discussed in my paper entitled Testing the Difference of Correlated Agreement Coefficients for Statistical Significance. AgreeTest, a cloud-based application can help you perform the techniques discussed in this paper and more.  Do not hesitate to check it out when find time.

The 2 sets of ratings used to compute the agreement coefficients under comparison may be totally independent or many have several aspects in common. Here 2 possible scenarios you may encounter in practice:

  • Both datasets of ratings were produced by 2 independent samples of subjects and 2 independent groups of raters.  In this case, the 2 agreement coefficients associated with these datasets are said to be uncorrelated. Their difference can be tested for statistical significance with an Unpaired t-Test (also implemented in AgreeTest).    
  • Both datasets of ratings were produced either by 2 overlapping samples of subjects or 2 overlapping groups of raters, or both.  In this case, the 2 agreement coefficients associated with these datasets are said to be correlated. Their difference can be tested for statistical significance with a Paired t-Test (also implemented in AgreeTest).
Several researchers have successfully used these statistical techniques in their research.  Here is a small sample of these publications:

No comments:

Post a Comment