In the spirit of Deep Climate, Anonymous, as he wrote to Eli, has penned an analysis of how Richard Tol cooked and indeed, it is McIntyre class, although from the nature of the mistake, it may very well have been hubris and not malice. Dickie's misbegotten analysis starts from Cook et al's description of their procedure, and just to show that Eli is an honest card shark, why not start with Tol's description first of what he did, and then of what Cook et al did. At the end of this it should become clear that, well, Eli will leave that to the bunnies. From Tol's blog
According to Cook et al., each abstract was assessed by at least 2 and at most 3 raters. In fact, 33 abstracts were seen by only one rater, 167 by four raters, and 5 by five.Eli will not get into why at this time that of the 12,000 odd abstracts 33 were seen by one rater, but bunnies might ask if those 33 had something about them and whether they were included in the published ratings. Tol rants on
If the initial ratings disagreed, as they did in 33% of cases, abstracts were revisited by the original raters. In 15.9% of cases, this led to agreement. In 17.1% of cases, a third rater broke the tie.
A reported error rate of 33%, with 2 ratings and 7 categories, implies that 18.5% of ratings were incorrect. 0.6% of abstracts received two identical but wrong ratings. 2.9% of ratings are still wrong after reconciliation. 3.2% of ratings are wrong after re-rating. In total, 6.7% of reported data are in error.At this point Dickie goes totally off the rails. That there were disagreements in the ratings is clear, such disagreement is to be expected when one imposes an ordinal value on a continuous one. For example, if bunnies measure a continuous value of 3.5 + 0.1 imposing an ordinal value will in half the cases result in an ordinal value of 3 and in the other half of the cases in an ordinal value of 4. In this case it is not surprising that in one third of the cases there was a disagreement amongst the raters.
Where the separation between ratings is only one unit these are not disagreements in any meaningful sense and anyone who claims that they are is fooling himself or attempting to deceive the reader. Those perhaps interested in cross tabbing can look at those from Brian's survey of 666 (yes, Eli reads the Old Testimony) abstracts selected at random from those published between 2002 and 2007 (more about the prequel here).
Tol shows a figure indicating how in the reconciliation and re-rating processes, ratings changed. For 92% of the cases the ratings changed by one unit.
Anonymouse describes Tol's mathturbation
Tol (2014) argues reasonably that from the number of disagreements, the error rate in the initial abstract ratings was about 18.5%. He further argues that if the same error rate applies during the reconciliation process then 6.7% of ratings will still be in error after reconciliation, implying that 11.8% were corrected during reconciliation. He assumes that the remaining errors are equally distributed among categories, a claim which is problematic but which will be assumed for the remainder of this analysis.Eli disagrees with this on two grounds. First, as mentioned previously, these differences, and certainly the differences of one unit are in no way errors. Second, the assumption that the remaining differences are equally distributed among categories is more than problematic. It is WRONG and the data showing that it is WRONG have been made public. Anonymouse continues
In Tol’s analysis the 6.7% of error ratings are redistributed to other categories in proportion to the corresponding proportion of shifts in the histogram. Shifts which would move the abstract rating outside the 1...7 range leave the abstract in the most extreme category.
This operation on the error ratings may be represented by the matrix S
T = (1-0.067)I - .067S using the 6.7% disagreement rate.
UPDATE: Everyone, inc. Eli got gremlins. See comments by Neal King
As Tol points out in his blogspot, AS does get the wrong matrix S. However, his calculations have been redone with the correct matrix S, and yield essentially the same results:
- If one were to expect Tol's approach to be applicable to the only solid evidence there is (the records of the reconciliation), the approach requires that the initial distribution have negative numbers in one category, prior to reconciliation; and
- the results of Tol's projection bear no similarity to the statistics they seek to model: How does a 2:98 split turn into a 55:45 split? via the power of Tol's assumptions! and this is for the category that comprises 2/3 of the papers that were studied.and Mark R
added by Eli for convenience Mark R's S matrix
Which agrees with Tol's. This is somewhat hidden in the URL that Neal gave. Bunnies have to download the Excel spreadsheet http://www.sussex.ac.uk/Users/rt220/Consensus.xlsx and also http://www.sussex.ac.uk/Users/rt220/tcp_allratings.xlsx which are under the headings Data and Graphs on Abstract Ratings and Individual Ratings
The problem being that if you use the inverse, the percentage agreeing with the IPCC consensus is 116% and the number of papers rejecting the consensus at level 5/7, 7 being outright Dragon Slayer territory, is -555 which is the new number of the beast. The two errors in the S matrix will not change that much
What to say, what to say. . . First of all shifts that would move the abstract out of the ordinal range should have a probability of zero. This unjustifiably inflates the number of abstracts at the extremes. Second, Skeptical Science has just published it's full response, and bunnies can see how Tol's assumptions lead to Tol's conclusions
Which are wrong