SPSS doesn't calculate Kappa when one variable is constant

Discussion:

(too old to reply)

Kurt

2007-05-16 16:21:48 UTC

I am trying to assess the level of agreement between two raters who
rated items as either Yes or No. This calls for Kappa. But if one
rater rated all items the same, SPSS sees this as a constant and
doesn't calculate Kappa.

For example, SPSS will not calculate Kappa for the following data,
because Rater 2 rated everything a Yes.

Rater1 Rater2
Item1 Y Y
Item2 N Y
Item3 Y Y
Item4 Y Y
Item5 N Y

SPSS completes the crosstab (which shows that the raters agreed 60% of
the time), but as for Kappa, it returns this note:

"No measures of association are computed for the crosstabulation of
VARIABLE1 and VARIABLE2. At least one variable in each 2-way table
upon which measures of association are computed is a constant."

Is there anywhere to get around this? I can calculate Kappa by hand
with the above data; why doesn't SPSS?

Thanks.

Kurt

Bruce Weaver

2007-05-16 19:30:24 UTC

Permalink

Post by Kurt
I am trying to assess the level of agreement between two raters who
rated items as either Yes or No. This calls for Kappa. But if one
rater rated all items the same, SPSS sees this as a constant and
doesn't calculate Kappa.
For example, SPSS will not calculate Kappa for the following data,
because Rater 2 rated everything a Yes.
Rater1 Rater2
Item1 Y Y
Item2 N Y
Item3 Y Y
Item4 Y Y
Item5 N Y
SPSS completes the crosstab (which shows that the raters agreed 60% of
"No measures of association are computed for the crosstabulation of
VARIABLE1 and VARIABLE2. At least one variable in each 2-way table
upon which measures of association are computed is a constant."
Is there anywhere to get around this? I can calculate Kappa by hand
with the above data; why doesn't SPSS?
Thanks.
Kurt

Does this solve your problem?

* ---------------------------------- .
data list list / r1 r2 count (3f2.0) .
begin data.
1 1 3
1 2 0
2 1 2
2 2 0
end data.

var lab
r1 'Rater 1'
r2 'Rater 2'
.
val lab r1 r2 1 'Yes' 2 'No'
.
weight by count.
crosstabs r1 by r2 /stat = kappa .

* Kappa is not computed because r2 is constant .
* Repeat, but with a very small number in place of 0 .

recode count (0 = .0001) (else=copy).
crosstabs r1 by r2 /stat = kappa .

* Now kappa is computed .

* ---------------------------------- .

--
Bruce Weaver
***@lakeheadu.ca
www.angelfire.com/wv/bwhomedir

klange

2007-05-16 23:58:06 UTC

Permalink

Hi Kurt,

Add one extra case to your file with the value of 'N' for Rater 2 (and
any value for Rater 1). Add a weighting variable that has a value of 1
for your real cases, and a very small value for this new dummy case
(eg, 0.00000001). Weight the file by the weighting variable (Data >
Weight cases), and then run the Crosstabs/Kappa.

The new case is enough for the Kappa to be calculated, but the
weighting means that it won't impact your results.

Cheers,
Kylie.

Kurt

2007-05-18 15:50:01 UTC

Permalink

Kylie:

I tried your method and SPSS correctly weighted out the dummy case.
The crosstab table showed 60% agreement (the raters agreed on 3 out of
5 valid ratings) which is correct. But it calculated Kappa as .000,
which is definitely not correct.

My test data was set up as follows:

###

Rater1 Rater2 Weight
Item1 Y Y 1
Item2 N Y 1
Item3 Y Y 1
Item4 Y Y 1
Item5 N Y 1
Dummy N N .000000001

###

Any ideas?

Kurt

Post by klange

Hi Kurt,
Add one extra case to your file with the value of 'N' for Rater 2 (and
any value for Rater 1). Add a weighting variable that has a value of 1
for your real cases, and a very small value for this new dummy case
(eg, 0.00000001). Weight the file by the weighting variable (Data >
Weight cases), and then run the Crosstabs/Kappa.
The new case is enough for the Kappa to be calculated, but the
weighting means that it won't impact your results.
Cheers,
Kylie.- Hide quoted text -
- Show quoted text -

Bruce Weaver

2007-05-18 15:58:06 UTC

Permalink

Post by Kurt
I tried your method and SPSS correctly weighted out the dummy case.
The crosstab table showed 60% agreement (the raters agreed on 3 out of
5 valid ratings) which is correct. But it calculated Kappa as .000,
which is definitely not correct.
###
Rater1 Rater2 Weight
Item1 Y Y 1
Item2 N Y 1
Item3 Y Y 1
Item4 Y Y 1
Item5 N Y 1
Dummy N N .000000001
###
Any ideas?
Kurt

I have a standalone Kappa program that gives these results for your table.

MEASUREMENT OF CLINICAL AGREEMENT FOR CATEGORICAL DATA:
THE KAPPA COEFFICIENTS

by

Louis Cyr and Kennon Francis

1992

COHEN`S (UNWEIGHTED) KAPPA
--------------------------

ESTIMATE STANDARD ERROR Z-STATISTIC
-------- -------------- -----------
KAPPA: 0.0000 0.0000 0.0000

STANDARD ERROR FOR CONSTRUCTING CONFIDENCE INTERVALS: 0.0000

JACKKNIFE ESTIMATE OF KAPPA
---------------------------

STANDARD ERROR
FOR
ESTIMATE CONFIDENCE INTERVALS
-------- --------------------
KAPPA: 0.0000 0.0000

--
Bruce Weaver
***@lakeheadu.ca
www.angelfire.com/wv/bwhomedir

Kurt

2007-05-18 17:39:32 UTC

Permalink

I calculated Kappa by hand and the Kappa is indeed .000. After looking
at the formula, I understand why this occurs mathematically, but
conceptually it doesn't make sense. If the percent observed agreement
is 60%, than shouldn't Kappa be somewhere above 0 (which suggests no
agreement at all)?

Post by Bruce Weaver

I have a standalone Kappa program that gives these results for your table.
THE KAPPA COEFFICIENTS
by
Louis Cyr and Kennon Francis
1992
COHEN`S (UNWEIGHTED) KAPPA
--------------------------
ESTIMATE STANDARD ERROR Z-STATISTIC
-------- -------------- -----------
KAPPA: 0.0000 0.0000 0.0000
STANDARD ERROR FOR CONSTRUCTING CONFIDENCE INTERVALS: 0.0000
JACKKNIFE ESTIMATE OF KAPPA
---------------------------
STANDARD ERROR
FOR
ESTIMATE CONFIDENCE INTERVALS
-------- --------------------
KAPPA: 0.0000 0.0000
--
Bruce Weaver
- Show quoted text -

Brendan Halpin

2007-05-18 17:50:41 UTC

Permalink

Post by Kurt
I calculated Kappa by hand and the Kappa is indeed .000. After looking
at the formula, I understand why this occurs mathematically, but
conceptually it doesn't make sense. If the percent observed agreement
is 60%, than shouldn't Kappa be somewhere above 0 (which suggests no
agreement at all)?

It doesn't measure raw agreement, but rather agreement above that
expected under independence. If rater 2 says yes all the time, then
the expected agreement under independence is zero for no and
whatever rater 1 says for yes.

Another way of thinking about is that if rater 2 says yes all the
time, s/he's not a rater (his/her opinion is not a variable but a
constant).

Brendan

--
Brendan Halpin, Department of Sociology, University of Limerick, Ireland
Tel: w +353-61-213147 f +353-61-202569 h +353-61-338562; Room F2-025 x 3147
mailto:***@ul.ie http://www.ul.ie/sociology/brendan.halpin.html

Bruce Weaver

2007-05-18 17:57:49 UTC

Permalink

Kappa = 0 suggests no agreement beyond what is expected by chance. Your
2x2 table looks like this:

R2
Y N
R1 Y 3 0
N 2 0

The expected cell counts (which are computed as row total * column total
divided by grand total) are also 3, 0, 2, and 0. So there are no
agreements beyond the 3 that are expected by chance.

--
Bruce Weaver
***@lakeheadu.ca
www.angelfire.com/wv/bwhomedir

Richard Ulrich

2007-05-19 02:45:03 UTC

Permalink

[snip, rest]

I've seen a couple of other posts....

Here is another example of 'raw agreement'
being different from kappa.
Y N
Y 90 5
N 5 0
kappa= -.05 - NEGATIVE -
with 90% agreement.

--
Rich Ulrich, ***@pitt.edu
http://www.pitt.edu/~wpilib/index.html

Kurt

2007-05-21 14:01:24 UTC

Permalink

And this example baffles me.

Post by Richard Ulrich
Here is another example of 'raw agreement'
being different from kappa.
Y N
Y 90 5
N 5 0
kappa= -.05 - NEGATIVE -
with 90% agreement.

I guess interpreting Kappa is not as straightforward as I had thought.
Or, I'm more dense than I imagined.

If there is 90% raw agreement, and Kappa is negative, this tells us
that there is no agreement beyond what is expected by chance. In other
words, the observed 90% agreement - those 90 out of 100 ratings which
were rated Yes by both raters - is within the realm of chance. Isn't
that a little counterintuitive? If I developed an instrument and my
raters consistently rated 90% of the items similarly, I would see that
as some evidence of good reliability. But Kappa would tell me
otherwise. ?

Kurt

2007-05-21 14:50:21 UTC

Permalink

And this example baffles me.

Post by Richard Ulrich
Here is another example of 'raw agreement'
being different from kappa.
Y N
Y 90 5
N 5 0
kappa= -.05 - NEGATIVE -
with 90% agreement.

I guess interpreting Kappa is not as straightforward as I had thought.
Or, I'm more dense than I imagined.

If there is 90% raw agreement, and Kappa is negative, this essentially
tells us that there is no agreement beyond what is expected by chance.
In other words, the observed 90% agreement - those 90 out of 100
ratings which were rated Yes by both raters - is within the realm of
chance. Isn't that a little counterintuitive? If I developed an
instrument and my raters consistently rated 90% of the items
similarly, I would see that as some evidence of good reliability. But
Kappa would tell me otherwise. ?

Brian

2007-05-21 21:58:39 UTC

Permalink

Post by Kurt
And this example baffles me.

Post by Richard Ulrich
Here is another example of 'raw agreement'
being different from kappa.
Y N
Y 90 5
N 5 0
kappa= -.05 - NEGATIVE -
with 90% agreement.

I guess interpreting Kappa is not as straightforward as I had thought.
Or, I'm more dense than I imagined.
If there is 90% raw agreement, and Kappa is negative, this essentially
tells us that there is no agreement beyond what is expected by chance.
In other words, the observed 90% agreement - those 90 out of 100
ratings which were rated Yes by both raters - is within the realm of
chance. Isn't that a little counterintuitive? If I developed an
instrument and my raters consistently rated 90% of the items
similarly, I would see that as some evidence of good reliability. But
Kappa would tell me otherwise. ?

I responded to Kurt offline with some syntax. Actually, what you all
are observing is called the Cichetti paradox. In essence, as the
marginal heterogeneity increases, the Pe becomes higher, which
diminishes the kappa. Try to calculate 90% agreement with marginal
homogeneity, e.g.,

Y N
Y 45 5
N 5 45

Kappa will be much larger. The paradox to which Cichetti refers is
that under ordinary rules of joint probability, the likelihood of
greater skewness (greater marginal heterogenity) is smaller and should
result in a lower Pe. However, the kappa family of chance corrected
statistics acts in opposition to that logic. Gwet's AC1 statistic
actually responds mildly in the reverse. As the marginal heterogenity
increases, there are concommitant increases in the AC1. Also, the
phenomenon which you are observing diminishes as the number of raters
and categories increases.

Brian

Ray Koopman

2007-05-21 23:24:04 UTC

Permalink

Post by Kurt
And this example baffles me.

Post by Richard Ulrich
Here is another example of 'raw agreement'
being different from kappa.
Y N
Y 90 5
N 5 0
kappa= -.05 - NEGATIVE -
with 90% agreement.

I guess interpreting Kappa is not as straightforward as I had thought.
Or, I'm more dense than I imagined.
If there is 90% raw agreement, and Kappa is negative, this essentially
tells us that there is no agreement beyond what is expected by chance.
In other words, the observed 90% agreement - those 90 out of 100
ratings which were rated Yes by both raters - is within the realm of
chance. Isn't that a little counterintuitive? If I developed an
instrument and my raters consistently rated 90% of the items
similarly, I would see that as some evidence of good reliability. But
Kappa would tell me otherwise. ?

This is exactly the situation for which Cohen invented kappa. When the
marginals are homogeneous (i.e., the row marginals are the same as the
column marginals), the expected rate of agreement given independence
(a.k.a. the chance agreement rate) grows as the marginals become more
skewed. In this particular example, 90% is actually less than the
chance agreement rate for two raters who have the same 95% bias toward
saying Yes but whose ratings are unrelated. The intent is to disabuse
people of the notion that a high agreement rate is ipso facto a
demonstration of reliability.

Richard Ulrich

2007-05-22 02:45:36 UTC

Permalink

Post by Ray Koopman

Post by Kurt
And this example baffles me.

Post by Richard Ulrich
Here is another example of 'raw agreement'
being different from kappa.
Y N
Y 90 5
N 5 0
kappa= -.05 - NEGATIVE -
with 90% agreement.

I guess interpreting Kappa is not as straightforward as I had thought.
Or, I'm more dense than I imagined.
If there is 90% raw agreement, and Kappa is negative, this essentially
tells us that there is no agreement beyond what is expected by chance.
In other words, the observed 90% agreement - those 90 out of 100
ratings which were rated Yes by both raters - is within the realm of
chance. Isn't that a little counterintuitive? If I developed an
instrument and my raters consistently rated 90% of the items
similarly, I would see that as some evidence of good reliability. But
Kappa would tell me otherwise. ?

Thanks. Very good.

Another way to expand on the idea is to note that kappa
is thoroughly symmetrical. It does not matter which category
is YES and which is NO -- the kappa stays the same.

In my "90% agreement" example, look at the effect of switching
Yes and No. Each rater says that 5% have some trait. There
is 90% in the category where they agree for no-trait, but (now)
there is 0% in the category of Yes.

--
Rich Ulrich, ***@pitt.edu
http://www.pitt.edu/~wpilib/index.html

c***@gmail.com

2020-07-02 15:06:50 UTC

Permalink

Post by klange

Hi Kurt,
Add one extra case to your file with the value of 'N' for Rater 2 (and
any value for Rater 1). Add a weighting variable that has a value of 1
for your real cases, and a very small value for this new dummy case
(eg, 0.00000001). Weight the file by the weighting variable (Data >
Weight cases), and then run the Crosstabs/Kappa.
The new case is enough for the Kappa to be calculated, but the
weighting means that it won't impact your results.
Cheers,
Kylie.- Hide quoted text -
- Show quoted text -

If there are two rater R1 & R2, then Can you tell me how to add third coloum for weight in SPSS as you did, can share your SPSS screenshot?
Your will be a very big hand for me. my email id : ***@gmail.com

Rich Ulrich

2020-07-02 16:11:16 UTC

Permalink

Post by c***@gmail.com

< snip, details >

Post by c***@gmail.com
If there are two rater R1 & R2, then Can you tell me how to add third coloum for weight in SPSS as you did, can share your SPSS screenshot?

The original thread from 2007 is available from Google,
https://groups.google.com/forum/#!topic/comp.soft-sys.stat.spss/ChdrpJTsvTk

and it give plenty of reason why you don't really want to
have a kappa reported when there is no variation.

Especially study my posts and the one of Ray Koopman.

--
Rich Ulrich

Rich Ulrich

2020-07-02 16:39:12 UTC

Permalink

On Thu, 02 Jul 2020 12:11:16 -0400, Rich Ulrich

Post by Rich Ulrich

Post by c***@gmail.com

< snip, details >

Post by c***@gmail.com
If there are two rater R1 & R2, then Can you tell me how to add third coloum for weight in SPSS as you did, can share your SPSS screenshot?

The original thread from 2007 is available from Google,
https://groups.google.com/forum/#!topic/comp.soft-sys.stat.spss/ChdrpJTsvTk
and it give plenty of reason why you don't really want to
have a kappa reported when there is no variation.
Especially study my posts and the one of Ray Koopman.

I will add to a point that I made in the original discussion.

The reader's problem arises because "agreement" is intuitively
sensible in usual circumstances, but it is nonsensical under
close examination when the marginal frequencies are extreme.

"Reliability" for a ratings of diagnosis logicallyy decomposes
into Sensitivity and Specificity -- picking out the Cases, and
picking out the Non-cases. Kappa is intended to combine
those measures, essentially. It looks at "performance above
chance." (For a 2x2 table, it is closely approximated by the
Pearson correlation.)

I gave a hypothetical table {90, 5; 5, 0} with a negative kappa.
90 5
5 0

One can arbitrarily label the rows and columns as starting with
Yes or with No. In one labeling there is 90% "agreement" as to
who is a case (cell A) , in the other case there is 0% "agreement"
(cell D).

When each of two raters are seeing 95% as Case, chance would
have them agree SOME time; so the 'agreement" of 0 is below
chance, and the kappa is negative.

--
Rich Ulrich

Richard Ulrich

2007-05-17 00:31:56 UTC

Permalink

On 16 May 2007 09:21:48 -0700, Kurt <***@cox.net> wrote:

[snip. Problem with one rater constant.

Post by Kurt
SPSS completes the crosstab (which shows that the raters agreed 60% of
"No measures of association are computed for the crosstabulation of
VARIABLE1 and VARIABLE2. At least one variable in each 2-way table
upon which measures of association are computed is a constant."
Is there anywhere to get around this? I can calculate Kappa by hand
with the above data; why doesn't SPSS?

I'm curious about your formula, a little bit. But I'm not
really worried about that.

When I wrote a program to do kappa, the program traps
a zero row or column as an error. I still think that
that's the proper treatment. I know that you don't have
a *test* on the result -- which makes the kappa rather
dubious, doesn't it? --, since the variance has a division
by zero.

I think that SPSS treats it right. IF you want a value, you
should have to interfere, say, in the way that Bruce shows.

--
Rich Ulrich, ***@pitt.edu
http://www.pitt.edu/~wpilib/index.html