Researchers who use chance-corrected agreement coefficients such as Cohen's Kappa, Gwet's AC1 or AC2, Fleiss' Kappa and many other alternatives in their research, often need to compare two coefficients calculated with 2 different sets of ratings. A rigorous way to do such a comparison is to evaluate the difference between these 2 coefficients for statistical significance. This issue was extensively discussed in my paper entitled Testing the Difference of Correlated Agreement Coefficients for Statistical Significance. AgreeTest, a cloud-based application can help you perform the techniques discussed in this paper and more. Do not hesitate to check it out when find time.
The 2 sets of ratings used to compute the agreement coefficients under comparison may be totally independent or many have several aspects in common. Here 2 possible scenarios you may encounter in practice:
- Both datasets of ratings were produced by 2 independent samples of subjects and 2 independent groups of raters. In this case, the 2 agreement coefficients associated with these datasets are said to be uncorrelated. Their difference can be tested for statistical significance with an Unpaired t-Test (also implemented in AgreeTest).
- Both datasets of ratings were produced either by 2 overlapping samples of subjects or 2 overlapping groups of raters, or both. In this case, the 2 agreement coefficients associated with these datasets are said to be correlated. Their difference can be tested for statistical significance with a Paired t-Test (also implemented in AgreeTest).
- Reliability in content analysis: The case of semantic feature norms classification
- Improving radiologic communication in oncology: a single-centre experience with structured reporting for cancer patients
- MRI-based Bosniak Classification of Cystic Renal Masses, Version 2019: Interobserver Agreement, Impact of Readers’ Experience, and Diagnostic Performance
- A Practical Guide to Assess the Reproducibility of Echocardiographic Measurements
No comments:
Post a Comment