**here**. Volume 2 chapters on the other hand would be downloaded

**here**.

Work on the 5th edition of the Handbook of Inter-Rater Reliability is in progress. Due to a large increase in number of pages from the 4th edition, I decided that the 5th edition will be released in 2 volumes. Volume 1 will be devoted to the Chance-corrected Agreement Coefficients (CAC), while Volume 2 will focus on the Intraclass Correlation Coefficient (ICC).
You have the opportunity to review the early drafts of many chapters of this 5th edition and to submit your comments or/and questions to me. I will appreciate it very much if you can report any typo or error to me after you review. Volume 1 chapters that are available can be downloaded **here**. Volume 2 chapters on the other hand would be downloaded **here**.

The only tool you will ever need to use

You will need to register in order to receive a password by email that allows you to log in to a trial version and test all the features of

```
library(irrCAC)
```

The functions included in this package can handle 3 types of input data:

- The contingency table,
- The distribution of raters by subject and by category,
- The raw data, which is essentially a plain dataset where each row represents a subject and each column, the ratings associated with one rater.

` data(package="irrCAC")`

**cont3x3abstractors** is one of 2 datasets included in this package and that contain rating data from 2 raters organized in the form of a contingency table. The following R script shows how to compute Cohen’s kappa, Scott’s Pi, Gwet’s AC_{1}, Brennan-Prediger,
Krippendorff’s alpha, and the percent agreement coefficients from this dataset.

```
cont3x3abstractors
#> Ectopic AIU NIU
#> Ectopic 13 0 0
#> AIU 0 20 7
#> NIU 0 4 56
kappa2.table(cont3x3abstractors)
#> coeff.name coeff.val coeff.se coeff.ci coeff.pval
#> 1 Cohen's Kappa 0.7964094 0.05891072 (0.68,0.913) 0e+00
scott2.table(cont3x3abstractors)
#> coeff.name coeff.val coeff.se coeff.ci coeff.pval
#> 1 Scott's Pi 0.7962397 0.05905473 (0.679,0.913) 0e+00
gwet.ac1.table(cont3x3abstractors)
#> coeff.name coeff.val coeff.se coeff.ci coeff.pval
#> 1 Gwet's AC1 0.8493305 0.04321747 (0.764,0.935) 0e+00
bp2.table(cont3x3abstractors)
#> coeff.name coeff.val coeff.se coeff.ci
#> 1 Brennan-Prediger 0.835 0.04693346 (0.742,0.928)
#> coeff.pval
#> 0e+00
krippen2.table(cont3x3abstractors)
#> coeff.name coeff.val coeff.se coeff.ci
#> 1 Krippendorff's Alpha 0.7972585 0.05905473 (0.68,0.914)
#> coeff.pval
#> 0e+00
pa2.table(cont3x3abstractors)
#> coeff.name coeff.val coeff.se coeff.ci
#> 1 Percent Agreement 0.89 0.03128898 (0.828,0.952)
#> coeff.pval
#> 0e+00
```

Suppose that you only want to obtain Gwet’s AC` ac1 <- gwet.ac1.table(cont3x3abstractors)$coeff.val`

Then use the variable ac1 to obtain ACAnother contingency table included in this package is named

```
distrib.6raters
#> Depression Personality.Disorder Schizophrenia Neurosis Other
#> 1 0 0 0 6 0
#> 2 0 3 0 0 3
#> 3 0 1 4 0 1
#> 4 0 0 0 0 6
#> 5 0 3 0 3 0
#> 6 2 0 4 0 0
#> 7 0 0 4 0 2
#> 8 2 0 3 1 0
#> 9 2 0 0 4 0
#> 10 0 0 0 0 6
#> 11 1 0 0 5 0
#> 12 1 1 0 4 0
#> 13 0 3 3 0 0
#> 14 1 0 0 5 0
#> 15 0 2 0 3 1
gwet.ac1.dist(distrib.6raters)
#> coeff.name coeff stderr conf.int p.value pa pe
#> Gwet's AC1 0.44480 0.08419 (0.264,0.625) 0.000116 0.55111 0.19148
fleiss.kappa.dist(distrib.6raters)
#> coeff.name coeff stderr conf.int p.value pa pe
#>Fleiss Kappa 0.41393 0.08119 (0.24,0.588) 0.000162 0.55111 0.23407
krippen.alpha.dist(distrib.6raters)
#> coeff.name coeff stderr conf.int p.value pa pe
#>Krippendorff 0.42044 0.08243 (0.244,0.597) 0.00016 0.55610 0.23407
bp.coeff.dist(distrib.6raters)
#> coeff.name coeff stderr conf.int p.value pa pe
#>Brennan-Pred 0.43889 0.08312 (0.261,0.617) 0.00012 0.55111 0.2
```

Once again, you can request a single value from these functions. To get only Krippendorff’s alpha coefficient without it’s precision measures,
you may proceed as follows:` alpha <- krippen.alpha.dist(distrib.6raters)$coeff`

The newly-created alpha variable gives the coefficient Two additional datasets that represent ratings in the form of a distribution of raters by subject and by category, are included in this package. These datasets are

` ac1 <- gwet.ac1.dist(cac.dist4cat[,2:4])$coeff`

```
Note that the input dataset supplied to the function is
```**cac.dist4cat[,2:4]**. That is, only columns 2, 3, and 4 are extracted from the original dataset and used as input data. We know from the value of the newly created variable that AC_{1} = 0.3518903.

## Computing agreement coefficients from raw ratings

One example dataset of raw ratings included in this package is **cac.raw4raters** and looks like this:

```
cac.raw4raters
#> Rater1 Rater2 Rater3 Rater4
#> 1 1 1 NA 1
#> 2 2 2 3 2
#> 3 3 3 3 3
#> 4 3 3 3 3
#> 5 2 2 2 2
#> 6 1 2 3 4
#> 7 4 4 4 4
#> 8 1 1 2 1
#> 9 2 2 2 2
#> 10 NA 5 5 5
#> 11 NA NA 1 1
#> 12 NA NA 3 NA
```

As you can see, a dataset of raw ratings is merely a listing of ratings that the raters assigned to the subjects. Each row is associated with a
single subject.Typically, the same subject would be rated by all or some of the raters. The dataset **cac.raw4raters** contains some missing ratings represented by the symbol NA, suggesting that some raters did not rate all subjects. As a matter of fact, in this particular case, no rater rated all subjects.

Here is how you can compute the various agreement coefficients using raw ratings:

```
pa.coeff.raw(cac.raw4raters)
#> $est
#> coeff.name pa pe coeff.val coeff.se conf.int p.value
#>Pct Agreement 0.81818 0 0.8181818 0.12561 (0.542,1) 4.35e-05
#> w.name
#> unweighted
#>
#> $weights
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 1 0 0 0 0
#> [2,] 0 1 0 0 0
#> [3,] 0 0 1 0 0
#> [4,] 0 0 0 1 0
#> [5,] 0 0 0 0 1
#>
#> $categories
#> [1] 1 2 3 4 5
gwet.ac1.raw(cac.raw4raters)
#> $est
#> coeff.name pa pe coeff.val coeff.se conf.int
#> 1 AC1 0.8181818 0.1903212 0.77544 0.14295 (0.461,1)
#> p.value w.name
#> 1 0.000208721 unweighted
#>
#> $weights
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 1 0 0 0 0
#> [2,] 0 1 0 0 0
#> [3,] 0 0 1 0 0
#> [4,] 0 0 0 1 0
#> [5,] 0 0 0 0 1
#>
#> $categories
#> [1] 1 2 3 4 5
fleiss.kappa.raw(cac.raw4raters)
#> $est
#> coeff.name pa pe coeff.val coeff.se conf.int
#> 1 Fleiss' Kappa 0.8181818 0.2387153 0.76117 0.15302 (0.424,1)
#> p.value w.name
#> 1 0.000419173 unweighted
#>
#> $weights
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 1 0 0 0 0
#> [2,] 0 1 0 0 0
#> [3,] 0 0 1 0 0
#> [4,] 0 0 0 1 0
#> [5,] 0 0 0 0 1
#>
#> $categories
#> [1] 1 2 3 4 5
krippen.alpha.raw(cac.raw4raters)
#> $est
#> coeff.name pa pe coeff.val coeff.se conf.int
#> 1 Krippendorff's Alpha 0.805 0.24 0.74342 0.14557 (0.419,1)
#> p.value w.name
#> 1 0.0004594257 unweighted
#>
#> $weights
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 1 0 0 0 0
#> [2,] 0 1 0 0 0
#> [3,] 0 0 1 0 0
#> [4,] 0 0 0 1 0
#> [5,] 0 0 0 0 1
#>
#> $categories
#> [1] 1 2 3 4 5
conger.kappa.raw(cac.raw4raters)
#> $est
#> coeff.name pa pe coeff.val coeff.se conf.int
#> 1 Conger's Kappa 0.8181818 0.2334252 0.76282 0.14917 (0.435,1)
#> p.value w.name
#> 1 0.0003367066 unweighted
#>
#> $weights
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 1 0 0 0 0
#> [2,] 0 1 0 0 0
#> [3,] 0 0 1 0 0
#> [4,] 0 0 0 1 0
#> [5,] 0 0 0 0 1
#>
#> $categories
#> [1] 1 2 3 4 5
bp.coeff.raw(cac.raw4raters)
#> $est
#> coeff.name pa pe coeff.val coeff.se conf.int
#> 1 Brennan-Prediger 0.8181818 0.2 0.77273 0.14472 (0.454,1)
#> p.value w.name
#> 1 0.0002375609 unweighted
#>
#> $weights
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 1 0 0 0 0
#> [2,] 0 1 0 0 0
#> [3,] 0 0 1 0 0
#> [4,] 0 0 0 1 0
#> [5,] 0 0 0 0 1
#>
#> $categories
#> [1] 1 2 3 4 5
```

Most users of this package will only be interested in the agreement coefficients and possibly in the related statistics such as the standard
error and p-values. In this case, you should run these functions as follows (AC_{1} is used here as an example. Feel free to
experiment with the other coefficients):

```
ac1 <- gwet.ac1.raw(cac.raw4raters)$est
ac1
#> coeff.name pa pe coeff.val coeff.se conf.int
#> 1 AC1 0.8181818 0.1903212 0.77544 0.14295 (0.461,1)
#> p.value w.name
#>1 0.000208721 unweighted
```

You can even request only the AC_{1} coefficient estimate
0.77544. You will then proceed as follows:

```
ac1 <- gwet.ac1.raw(cac.raw4raters)$est
ac1$coeff.val
[1] 0.77544
```

#
References:

- Gwet, K.L. (2014)
*Handbook of Inter-Rater Reliability*, 4th
Edition. Advanced Analytics, LLC.
- Klein, D. (2018) “Implementing a general framework for assessing
interrater agreement in Stata,” ,
**18**, 871-901.

```
```

```
```

```
```

```
```## Monday, August 26, 2019

###
R Package for Intraclass Correlation Coefficient as a Measure of Inter-Rater Reliability

Unknown
```
library(irrICC)
```

# Installation

devtools::install_github(“kgwet/irrICC”)

# Abstract

**irrICC** is an R package that provides several functions for calculating various Intraclass Correlation Coefficients (ICC). This package follows closely the general framework of inter-rater and intra-rater reliability presented by Gwet (2014). Many of the intraclass correlation coefficients discussed by Shrout and Fleiss (1979) are also implemented in this package.

All input datasets to be used with this package must contain a mandatory “Target” column of all subjects that were rated, and 2 or more columns “Rater1”, “Rater2”, …. showing the ratings assigned to the subjects. The Target variable mus represent the first column of the data frame, and every other column is assumed to contained ratings from a rater. Note that all ratings must be numeric values for the ICC to be calculated. For example, here is a dataset “iccdata1” that is included in this package:

```
iccdata1
#> Target J1 J2 J3 J4
#> 1 1 6.0 1.0 3.0 2
#> 2 1 6.5 NA 3.0 4
#> 3 1 4.0 3.0 5.5 4
#> 4 5 10.0 5.0 6.0 9
#> 5 5 9.5 4.0 NA 8
#> 6 4 6.0 2.0 4.0 NA
#> 7 4 NA 1.0 3.0 6
#> 8 4 8.0 2.5 NA 5
#> 9 2 9.0 2.0 5.0 8
#> 10 2 7.0 NA 2.0 6
#> 11 2 8.0 NA 2.0 7
#> 12 3 10.0 5.0 6.0 NA
```

The first column “Taget” (the name Target can be replaced with any other name you like) contains subject identifiers, while J1, J2, J3, J4 are
the 4 raters (referred to here as Judges) and the ratings they assigned to the subjects. You will notice that the Target column contains
duplicates, indicating that some subjects were rated multiple times. Moreover, none of these judges rated all subjects as seen by the
presencce of missing ratings identified with the symbol NA.

Two other datasets, iccdata2, and iccdata3 come with the package for you to experiment with. Even if your data frame contains several variables, note that only the Target and the Rater columns must be supplied as parameters to the functions. For example the iccdata2 data frame
contains a variable named Group, which indicates the specific group each Target is categorized. It must be excluded from the input dataset as
follows: iccdata2[,2:6].

# Computing various ICC values

To determine what function you need, you must first have a statistical description of experimental data. There are essentially 3 statistical
models recommended in the literature for describing quantitative inter-rater reliability data. These are commonly refer to as model 1,
model 2 and model 3.

**Model 1**

Model 1 is uses a single factor (hence the number 1) to explain the variation in the ratings. When the factor used is the subject then
the model is referred to as Model 1A and when it is the rater the model is named Model 1B. You will want to use Model 1A if not all
subjects are rated by the same roster of raters. That raters may change from subject to subject. Model 1B is more indicated if
different raters may rate different rosters of subjects. Note that while Model 1A only allows for the calculation of inter-rater
reliability, Model 1B on the other hand only allows for the calculation of intra-rater reliability.

Calculating the ICC under Model 1A is done as follows:

```
icc1a.fn(iccdata1)
#> sig2s sig2e icc1a n r max.rep min.rep Mtot ov.mean
#> 1 1.761312 5.225529 0.2520899 5 4 3 1 40 5.2
```

It follows that the inter-rater reliability is given by 0.252, the first 2 output statistics being the subject variance component 1.761
and error variance component 5.226 respectively. You may see a description of the other statistics from the function’s documentation.

The ICC under Model 1B is calculated as follows:

```
icc1b.fn(iccdata1)
#> sig2r sig2e icc1b n r max.rep min.rep Mtot ov.mean
#> 1 4.32087 3.365846 0.5621217 5 4 3 1 40 5.2
```

It follows that the intra-rater reliability is given by 0.562, the first 2 output statistics being the rater variance component 4.321 and
error variance component 3.366 respectively. A description of the other statistics can be found in the function’s documentation.

**Model 2**

Model 2 includes a subject and a rater factors, both of which are considered random. That is, Model 2 is a pure random factorial ANOVA
model. You may have Model 2 with a subject-rater interaction and Model 2 without subject-rater interaction. Model 2 with
subject-rater interaction is made up of 3 factors: the rater, subject and interaction factors, and is implemented in the function
*icc2.inter.fn*.

For information, the mathematical formulation of the full Model 2 is as follows:
*y*_{ijk} = *μ* + *s*_{i} + *r*_{j} +
(*sr*)_{ij} + *e*_{ijk}, where *y*_{ijk} is the
rating associated with subject *i*, rater *j* and replicate (or measurement) *k*. Moreover, *μ* is
the average rating, *s*_{i} subject *i*’s effect, *r*_{j} rater *j*’s effect,
(*sr*)_{ij} subject-rater interaction effect associated with subject *i* and rater *j*, and *e*
_{ijk} is the error effect. The other statistical models are similar to this one. Some may be based on fewer factors or the
assumptions applicable to these factors may vary from model to model. Please read Gwet (2014) for a technical discussion of these models.

Calculating the ICC from the iccdata1 dataset (included in this package) and under the assumption of Model 2 with interaction is done
as follows:

```
icc2.inter.fn(iccdata1)
#> sig2s sig2r sig2e sig2sr icc2r icc2a n r
#> 1 2.018593 4.281361 1.315476 0.4067361 0.251627 0.8360198 5 4
#> max.rep min.rep Mtot ov.mean
#> 1 3 1 40 5.2
```

This function produces 2 intraclass correlation coefficients **icc2r**
and **icc2a**. While **iccr** represents the inter-rater reliability
estimated to be 0.252 , **icc2a** represents the intra-rater
reliability estimated at 0.836. The first 3 output statistics are
respectively the the subject, rater, and interaction variance
components.

The ICC calculation with the iccdata1 dataset and under the assumption
of Model 2 without interaction is done as follows:

```
icc2.nointer.fn(iccdata1)
#> sig2s sig2r sig2e icc2r icc2a n r max.rep
#> 1 2.090769 4.34898 1.598313 0.2601086 0.801157 5 4 3
#> min.rep Mtot ov.mean
#> 1 1 40 5.2
```

The 2 intraclass correlation coefficients have now become *icc2r* = 0.26 and *icc2a*=0.801. That is the estimated inter-rater reliability slightly went up while the intra-rater reliability coefficient slightly went down.

**Model 3**

To calcule the ICC using the iccdata1 dataset and under the
assumption of Model 3 with interaction, you should proceed as
follows:

```
icc3.inter.fn(iccdata1)
#> sig2s sig2e sig2sr icc2r icc2a n r max.rep
#> 1 2.257426 1.315476 0.2238717 0.5749097 0.6535279 5 4 3
#> min.rep Mtot ov.mean
#> 1 1 40 5.2
```

Here, the 2 intraclass correlation coefficients are given by *icc2r* = 0.575 and *icc2a* = 0.654. The estimated inter-rater reliability went up substantially while the intra-rater reliability coefficient went down substantially compared to Model 2 with interaction.

Assuming Model 3 without interaction, the same coefficients are
calculated as follows:

```
icc3.nointer.fn(iccdata1)
#> sig2s sig2e icc2r icc2a n r max.rep min.rep Mtot
#> 1 2.241792 1.470638 0.6038611 0.6038611 5 4 3 1 40
#> ov.mean
#> 1 5.2
```

It follows that the 2 ICCs are given by *icc2r* = 0.604 and *icc2a* = 0.604. As usual, the omission of an interaction factor leads to a slight increase in inter-rater reliability and a slight decrease in intra-rater reliability. In this case, both become identical.

# References:

- Gwet, K.L. (2014)
*Handbook of Inter-Rater Reliability*, 4th
Edition. Advanced Analytics, LLC.
- Shrout, P. E., and Fleiss, J. L. (1979), "Intraclass Correlations: Uses in Assessing
Rater Reliability."
*Psychological Bulletin*, **86**(2), 420-428.

## Saturday, January 26, 2019

###
Inter-Rater Reliability for Stata Users

Stata users now have a convenient way to compute a wide variety of agreement coefficients within a general framework. The module KAPPAETC can be installed from within Stata and computes various measures of inter-rater agreement and associated standard errors and confidence intervals.

A very interesting background article entitled "Implementing a general framework for assessing interrater agreement in Stata" by Daniel Klein is certainly a must read for Stata users who want to understand the calculations performed by KAPPAETC behind the scene. KAPPAETC is a Stata package that was remarkably well written, and is what I strongly recommend to all Stata users for calculating the the AC1, Kappa, Krippendorff agreement coefficients and associated standard errors, and confidence intervals.

```
```

```
```

```
```

```
```

```
```

```
```

```
```

```
```

```
```

```
```

```
```

```
```

```
```