*I*. I firmly believe that the mathematical derivations that led to this coefficient were wrong, even though the underlying ideas are right. A proper translation of these ideas would inevitably have led to the Brennan-Prediger coefficient or to the percent agreement depending on the assumption made.

_{r}The Perreault-Leigh agreement coefficient is formally defined as follows:

where S defined by,

is the agreement coefficient recommended by Bennet et al. (1954) and is a special case of the coefficient recommended by Brennan and Prediger (1981). The symbol Ir used by Perreault and Leigh (1989) appears to stand for “Index of Reliability.”

I carefully reviewed the Perreault and Leigh article. It presents an excellent review of the various agreement coefficients that were current at the time it was written. Perreault and Leigh define

*I*as the percent of subjects that a typical judge could code consistent given the nature of the observations. Note that

_{r}*Ir*is an attribute of the typical judge, and therefore does not represent any aspect of agreement among the judges. Perreault and Leigh (1989) consider the product NxI

_{r}

^{2}(with N representing the number of subjects) to represent the number of reliable judgments on which judges agree. This cannot be true. To see this note that I

_{r}

^{2}is the probability that 2 judges both independently perform a reliable judgment. If both (reliable) judgments must lead to an agreement then they have to refer to the exact same category. However the probability I

_{r}

^{2}does not say which category was chosen and cannot represent any agreement among judges. Even if you decide to assume that any 2 reliable judgments must necessarily result in an agreement, then the judgments will no longer be independent. The probability for two judges to agree will now become equal to the probability for the first rater to perform a reliable judgment times the conditional probability for the second judge to perform a reliability judgment given that the first judge did. This second conditional probability cannot be evaluated unless there are additional assumptions.

What Perreault and Leigh (1989) have proposed is not an agreement coefficient. Their coefficient quantifies something other than the extent of agreement among raters. It should not be compared with the other coefficients available in the literature until someone can tell us what it does.

**References:**

[1] Bennett, E. M., Alpert, R. & Goldstein, A. C. (1954). Communication through limited response questioning.

*Public Opinion Quarterly*, 18, 303-308.

[2] Brennan, R. L., & Prediger, D. J. (1981). Coefficient kappa: Some uses, misuses, and alternatives.

*Educational and Psychological Measurement*, 41, 687-699.

[3] Perreault, W. D. & Leigh, L. E. (1989). Reliability of nominal data based on qualitative judgments.

*Journal of Marketing Research*, 26, 135-148.

## No comments:

## Post a Comment