The Rater agreement is important in clinical research, and Cohens Kappa is a widely used method for assessing the reliability of inter-advisors; However, there are well-documented statistical problems related to measurement. To assess its usefulness, we evaluated it at Gwet`s AC1 and compared the results. McCoul ED, Smith TL, Mace JC, Anand VK, Senior BA, Hwang PH, Stankiewicz JA, Tabaee A: Interrater Agreement on Rhinoscopy in patients with a history of endoscopic sinus surgery. Int Rhinol Allergy Forum. 2012, 2: 453-459. 10.1002/alr.21058. Maxwell`s statistically significant test statistics mentioned above show that advisors disagree in at least one category. McNemar`s general statistics indicate that differences of opinion are not evenly distributed. To see how AgreeStat360 processes this dataset to generate different agree coefficients, please read the video below.
This video can also be viewed on youtube.com to get more clarity if necessary. Cohens Kappa, Gwets AC1 and the agreement percentage were calculated with the 2011.3 version of AgreeStat (Advanced Analytics, Gaithersburg, MD, USA). Tables 3 and 4 show subject responses by rat, by response category, and by percentage of match. The overall agreement was between 84% and 100%, with the average SD being 96.58 ± 4.99. The most common disagreement among the 4 pairs of spleeners was with respect to schizoid and aggressive liabilities (3 out of 4 pairs), while the second most common was dependence, strength and depression PDs (2 out of 4 pairs). None of the PDs showed 100% match among the 4 pairs of advisors. For two advisors, this function gives Cohens Kappa (weighted and unweighted), Scotts pi and Gwetts AC1 as a measure for the categorical evaluations of two evaluators (Fleiss, 1981; Fleiss, 1969; Altman, 1991; Scott 1955. For three or more ratters, this function offers extensions of the Cohen-Kappa method, which is due to Fleiss and Cuzick (1979) for two possible answers by spleen and Fleiss, Nee and Landis (1979) in the general case of three or more answers per advisor. Note that Gwet`s consent coefficient does not depend on the Lesern independence hypothesis, so you can use it to reflect the scope of the match in more contexts than kappa. Gwet KL: Calculating the reliability of inter-raters and their variance in high match. Br J Math Stat Psychol. 2008, 61: 29-48.
10.1348/000711006X126600. Gwet K: Kappa is not satisfied with assessing the scope of the agreement between the advisors. www.google.ca/url?sa=t&rct=j&q=kappa%20statistic%20is%20not%, e (K) – probability of random agreement – A 1 N ∗ B 1 N – A 2 N ∗ B 2 N To assess Interrater`s reliability coefficient for personality disorders, Gwt AC1 Cohens Kappa is superior. Our results favoured the Gwet method compared to Cohens Kappa in terms of prevalence or marginal probability problem. Based on the different formulas used to calculate the degree of probability-adjusted compliance, AC1 gwets has been proven to be a more stable inter-raterr reliability coefficient than Cohen`s Kappa. It was also found that it was less affected by prevalence and marginality than Cohen kappa and should therefore be considered for an interregulated reliability analysis. I decided to reread the statistics for my research study with two advisors who analyze the content of 23 online programs with the online community of the Inquiry Syllabus Rubric for my presentation at AERA. AgreeStat was used to obtain Cohens k and Gwets AC1 to determine the reliability of inter-raters by category. Tables 1A-B show how k statistics were influenced by high correspondence in the teaching design (ID) category of cognitive presence (CP), while Gwets AC1 was not.