Monday, June 03, 2013

Cooking with Tol

Since youse guys need more thread, here is a spool.  Pick one



UPDATE:  Eli just saw a technical comment in Science, which with the change of a word or two summarizes this silliness
Tol criticizes the statistical analyses used to support the conclusions in our paper.  His theory biased criticism is disproportionate in view of the robustness of our findings even if different statistical methods are applied and falls short in explaining the prepubetal* nature of his and others' criticisms.
* Eli was going to use childish, but this is a science blog.

44 comments:

Albatross said...

I pick the reasonable, grounded and humble guy on the left ;) He is also the one with his credibility left in tact.

Tol needs to learn to be gracious in the face of defeat and that even he has to eat crow sometimes.

chris said...

When all the bullying and grandstanding is done (won't be very long), Cook et al will still be sitting tidily in the scientific literature as a useful piece of information.

Rattus Norvegicus said...

Well Shub, I'd rate that abstract as a 3, which is what Cook et, al. did (NOTE: I read what you posted before checking on it in the TCP database). It wasn't easy, but the fact that it talked about the preference for using biogas as opposed to other sources (coal) for generating electricity. Use of the word "beneficial" provides a big hint. It doesn't require the use of the Parse-o-matic(tm), but it does take reasonably close reading. The endorsement is implicit, but clear.

Care to play again?

willard said...

I will again point to this:

> A direct comparison of abstract rating versus self-rating endorsement levels for the 2142 papers that received a self-rating is shown in table. More than half of the abstracts that we rated as 'No Position' or 'Undecided' were rated 'Endorse AGW' by the paper's authors.

http://iopscience.iop.org/1748-9326/8/2/024024/article

Again, that is all.

willard said...

Rattus,

You're colluding with Shub. This makes you lose all your independence points. Thanks for playing, though.

;-)

***

Speaking of independence, it seems that the Wiki has something to say:

(Quote begin)

There are several operational definitions of "inter-rater reliability" in use by Examination Boards, reflecting different viewpoints about what is reliable agreement between raters.

There are three operational definitions of agreement:

1. Reliable raters agree with the "official" rating of a performance.

2. Reliable raters agree with each other about the exact ratings to be awarded.

3. Reliable raters agree about which performance is better and which is worse.

These combine with two operational definitions of behavior:

A. Reliable raters are automatons, behaving like "rating machines". This category includes rating of essays by computer. This behavior can be evaluated by Generalizability theory.

B. Reliable raters behave like independent witnesses. They demonstrate their independence by disagreeing slightly. This behavior can be evaluated by the Rasch model.

https://en.wikipedia.org/wiki/Inter-rater_reliability

Which model should be preferred in our case?

Rattus Norvegicus said...

Willard, B is obviously -- in the extant case -- the better model. Indeed, as reported in the paper a certain number of abstracts (~10-15%) did have this sort of conflict. I've played the "rate the abstracts" game over at SkS and can understand why there might have been instances of disagreement.

FWIW, when I ran into something which gave me trouble I tended to rate on the conservative side (ie. a 4 vs. a 3, which is where most of the difficulty occurred). But then sometimes life just ain't essy...

willard said...

Note in which context the claim of independence is made:

> Each abstract was categorized by two independent, anonymized raters.

http://iopscience.iop.org/1748-9326/8/2/024024/article

Compare with Richard's report:

> The abstracts were assessed by a team of 24 volunteers [the footnote indice is misplaced] (who discussed their ratings with one another) [...]

https://docs.google.com/file/d/0Bz17rNCpfuDNM1RQWkQtTFpQUmc/edit

This sentence presents three independent ideas. All should be developed. By stringing them together, they are bulldozed as secondary arguments to create a piling on effect.

The parenthesis is made so general as to fear for the worse, and should be replaced by the text in the footnote, which is more precise.

The accusation of lack of independence has not been made, contrary to Wiki's recommendation: Be Bold[1].

These should be three very big tells for anyone used to technical writing. Whatever the merits of these three claims, they are of little relevance to what the authors claimed: independence of inter-rating for each items. The authors have not claimed the raters are not learning or training themselves along the way.

[1] http://en.wikipedia.org/wiki/Wikipedia:Be_bold

willard said...

> [When I ran into something which gave me trouble I tended to rate on the conservative side [...]

Exactly. In fact, considering the arbitrary nature of the task, it is only natural to expect raters to err on the conservative side. Not that we should expect raters to bear the same implicatures as the authors themselves. Tom Curtis certainly does not have the same implicatures as Richard Tol:

> That means by a process of elimination, Tol thinks that the following papers all have neutral abstracts (Cook et al rating in brackets): [Follows the analysis of abstract (1) to (5)] He may have a point about (3). He is clearly incorrect about the others.

http://bybrisbanewaters.blogspot.ca/2013/05/tols-gaffe.html

***

Here's an example of a more automatic context, which presumably (h/t Richard Tol) might be analyzed with generalizability theory:

Reliability of assessment tools in rehabilitation: an illustration of appropriate statistical analyses

Gabrielle Rankin
Maria Stokes

Objective: To provide a practical guide to appropriate statistical analysis of a reliability study using real-time ultrasound for measuring muscle size as an example.

Design: Inter-rater and intra-rater (between-scans and between-days) reliability.

Subjects: Ten normal subjects (five male) aged 22–58 years.

Method: The cross-sectional area (CSA) of the anterior tibial muscle group was measured using real-time ultrasonography.

Main outcome measures: Intraclass correlation coefficients (ICCs) and the 95% confidence interval (CI) for the ICCs, and Bland and Altman method for assessing agreement, which includes calculation of the mean difference between measures (d), the 95% CI for d, the standard deviation of the differences (SD diff), the 95% limits of agreement and a reliability coefficient.

Results: Inter-rater reliability was high, ICC (3,1) was 0.92 with a 95% CI of 0.72 → 0.98. There was reasonable agreement between measures on the Bland and Altman test, as d was -0.63 cm2, the 95% CI for d was -1.4 → 0.14 cm2, the SDdiff was 1.08 cm2, the 95% limits of agreement -2.73 → 1.53 cm2 and the reliability coefficient was 2.4. Between-scans repeatability was high, ICCs (1,1) were 0.94 and 0.93 with 95% CIs of 0.8 → 0.99 and 0.75 → 0.98, for days 1 and 2 respectively. Measures showed good agreement on the Bland and Altman test: d for day 1 was 0.15 cm2 and for day 2 it was -0.32 cm2, the 95% CIs for d were -0.51 → 0.81 cm2 for day 1 and -0.98 → 0.34 cm2 for day 2; SDdiff was 0.93 cm2 for both days, the 95% limits of agreement were -1.71 → 2.01 cm2 for day 1 and -2.18 → 1.54 cm2for day 2; the reliability coefficient was 1.80 for day 1 and 1.88 for day 2. The between-days ICC (1,2) was 0.92 and the 95% CI 0.69 0.98. The d was -0.98 cm2, the SDdiff was 1.25 cm2 with 95% limits of agreement of -3.48 → 1.52 cm2 and the reliability coefficient 2.8. The 95% CI for d(-1.88 → -0.08 cm2) and the distribution graph showed a bias towards a larger measurement on day 2.

Conclusions: The ICC and Bland and Altman tests are appropriate for analysis of reliability studies of similar design to that described, but neither test alone provides sufficient information and it is recommended that both are used.

http://cre.sagepub.com/content/12/3/187.short

***

It might be nice to have a tool like that to analyze abstracts.

Shub said...

Well, rattus, here's the thing. According to Tom Curtis, you should have rated that one as a 4, i.e., as neutral.

The raters apparently behaved like automatons. There is nothing in the mere reading of the text that would indicate 'implicit support'.

And to top it, the phrase "global warming" is neither in the title, nor the abstract. It appears in the citation as a keyword: "global warming contributions". So your guesswork is misplaced.

I chose this particular abstract as the first one I came across in the '3', implicit category.

As willard understands, your justification for including this in 3 proves my point.

I would contend, on the other hand, that such papers should be rated neutral, even if they profess undying support to the global warming theory because theirs is not a considered opinion but merely an assumption required to frame certain points they make.

EliRabett said...

Did anybunny look at Eli's crosstabs?

Shame

Jay said...
This comment has been removed by the author.
Jay said...
This comment has been removed by the author.
Jay said...

"It is a strange claim to make. Consensus or near-consensus is not a scientific argument. Indeed, the heroes in the history of science are those who challenged the prevailing consensus..." ~ Richard Tol

So science is about heroes!

Who knew?

Anonymous said...

Over at wottsupwiththatblog Richard has conceded that he only has destructive criticism as an option. He can't repeat the analysis because it is too much work, and shutting up is apparently "wrong". But Jay provides us with the quote that possibly explains that: he must attack the consensus somehow, otherwise he can't be a hero...

Marco

Neven said...

There is no doubt in my mind that the literature on climate change overwhelmingly supports the hypothesis that climate change is caused by humans. I have very little reason to doubt that that is indeed true and that the consensus is correct. Cook et al., however, failed to demonstrate this.

In other words: "I don't really care about this, but it's wrong, wrong, WROOONNNGGG!!!"

It's great fun seeing the contortions this paper has caused. John Cook is doing good work.

Rattus Norvegicus said...

Neven,

Yes, this paper has spawned almost as big a denial industry as MBH98!

muhammed waqar said...

What is Lol...? Every thing you want in Lol and Troll, Now you Get all in one Network, ThatIsLol.. Lol Pictures, Lol Videos, Lol Peoples, Funny Peoples, Troll Images, troll pictures, funny pictures, Facebook pictures, facebook funny pictures, facebook lol pictures, Funny videos and Much More only Laughing out of Laughing
thatislol.com

willard said...

muhammed,

With all due respects, why would bunnies need Lol when they have Richard?

***

Added 10 more comments:

https://twitter.com/nevaudit

Click on my name to see them.

The last one echoes what I was hinting earlier:

> @dana1981 & @RichardTol, 30th comment - "perhaps raters were tired" is trivial. The opposite would be even more troublesome, e.g. cheating.

https://twitter.com/nevaudit/status/341907612469710849

It's as if Richard was expecting that raters were peanut sorters:

http://vimeo.com/38921595

willard said...

Marco,

Thanks for the heads up. I've posted a response at Wott's. It starts thus:

Dear Richard,

I don’t think your trichotomy captures your options very well. You don’t have to reproduce Cook & al’s experiment to satisfy your (c). That is, you have forgotten about this option:

(c2) Prescribe how to redo that research by clearly stating a specification you’d consider valid.

You do have the resources to do that. Or at least you do choose to invest your resources in less constructive endeavours. See for instance this morning’s tweet where you took the pain to find duplicate records in the data.

See the rest over there:

http://wottsupwiththatblog.wordpress.com/2013/06/02/watt-about-richard-tol/comment-page-1/#comment-494

willard said...

> The raters apparently behaved like automatons.

Richard's remark about non-homoscedaticity rather disproves this point, unless we're talking about would-be automatons programmed to reproduce the behaviour of human raters.

***

> I would contend, on the other hand, that such papers should be rated neutral, even if they profess undying support to the global warming theory because theirs is not a considered opinion but merely an assumption required to frame certain points they make.

And this would misconstrue what "to endorse" means.

According to this reading, to endorse means to substantiate.

All this has been discussed at Bart's:

> TL;DR – **Endorsing** a claim C is not the same as **claiming** C.

http://ourchangingclimate.wordpress.com/2013/05/17/consensus-behind-the-numbers/#comment-18771

Anonymous said...

"It is a strange claim to make. Consensus or near-consensus is not a scientific argument. Indeed, the heroes in the history of science are those who challenged the prevailing consensus..." ~ Richard Tol

He appears to have forgotton what comes next.

The challenge, if successful, then becomes the consensus.

And ignored the other outcome - unsuccessful challenges are very likely just wrong.

Anonymous Etc

Shub said...

willard
Curtis schooled you in the previous thread that raters just read the abstract to rate(Thanks Eli for breaking up the thread). Here we have rattus, giving a perfect example of the opposite. He read it, understood it after his own way and rated it. The abstract itself, on the other hand, does not have material to support his classification. In response, you are shifting the field of argument?

It is solely the interpreted component of the Cook database that inflates numbers for the consensus position. Cook would have still had decent numbers, maybe not the 97 that he craved but 90 something, had he included the implicits into the neutrals or thrown them out. But no, that couldn't be done.

Please don't think that 'endorse the consensus' is a scientifically better term than claiming it, or contributing to it. If anything, it leads to more problems. 'Endorse' - is already a jacked-up, non-standard term to begin with. 'Endorse' a consensus means a consensus position already exists and you have people agreeing with it. Cook et al take this agreement, and prove that there is a consensus! Totally and completely wrong - at the most, you show that there is a consensus that there is a consensus.

Could there be a more stupid way of doing research?

A rigorous way would have be to just study climate papers that are just about AGW and attribution, and assert that most seem to believe, or disbelieve, or whatever, x or y, with regards to 'human influence'. Everything else is just padding.

Anonymous said...

Willard rephrased: With all due respects, why would bunnies need Lol when they have Tol?

Rib Smokin' Bunny

willard said...

> Curtis schooled you in the previous thread that raters just read the abstract to rate [...]

That's not what I read, and if that's what he said, he's wrong. Here's one of my FIRST tweets to Richard on this subject:

ABSTRACTS, @RichardTol, ABSTRACTS. You've been told. Thank you.

https://twitter.com/nevaudit/status/337772674531065856

Also, I'm not sure what part of

> Not that we should expect raters to bear the same implicatures as the authors themselves. Tom Curtis certainly does not have the same implicatures as Richard Tol: [...]

you do not understand. Implicatures are not implications. Opinions of raters can differ. That raters can err can also apply to self-raters. See Tom Curtis' post on Richard's self-ratings.

***

> The abstract itself, on the other hand, does not have material to support his classification [...]

Please tell us what you mean by "material support".

***

> It is solely the interpreted component of the Cook database that inflates numbers for the consensus position.

Does this interpreted component comprises self-ratings, dear Shub?

***

> Cook et al take this agreement, and prove that there is a consensus!

Yet another untruth, Shub. Please reread the paper's abstract. Chris Maddiggan also tried the circularity argument, btw.

***

> Could there be a more stupid way of doing research?

Yes: stating untruth after untruth while failing to answer ANY question to which one is committed.

Viz. what you're doing right now. Or perhaps you're not doing research and only creating #FUD?

***

> A rigorous way would have be to just study climate papers that are just about AGW and attribution [...]

This has been done already, Shub. The way your claim makes me doubt you know about Bart's survey of that part. Do you?

You'll never guess which percent they got.

dhogaza said...

" Indeed, the heroes in the history of science are those who challenged the prevailing consensus..." ~ Richard Tol

I'm sure he has that medical hero Linus Pauling, who proved that megadoses of vitamin C cures the common cold, in mind when he said that ...

John Mashey said...

Pauling was a great ... but he both did the vitamin C thing and quashed quasicrystals (and Shechtman) for a long time. Schectman got Nobel in 2011 for his work.

Going against the mainstream and being right gets hero awards, going against, being proven wrong, and moving on is OK.

Going against the mainstream dumbly, proven wrong eery time, easily, and keeping it: history's wastebasket.

A good example of going against mainstream, stirring more research by other people, having evidence accumulate ... would be Bill Ruddiman's work. Look for his new book, earth Transformed, ~October.
Much research has happened since 2005's Plows, Plagues and Petroleum. I think it's an interdisciplinary tour de force, but certain people will really, really hate it, because it explains the last 10,000 years of climate all too well.

dhogaza said...

The point is that going against the mainstream is no predictor of success, making Tol's comment stupid.

Being a fool is at least as likely (probably much more likely) than becoming a hero when one bucks the consensus, and the more fundamental the consensus (say, the radiative properties of CO2), the more likely one is to be made a fool.


EliRabett said...

Since sundry are getting their knickers in a twist about the fact the raters talked to each other to achieve a common view (not on each abstract, but in general) how about thinking about the Delphi method (from the wiki)
----------

The Delphi method (/ˈdɛlfaɪ/ DEL-fy) is a structured communication technique, originally developed as a systematic, interactive forecasting method which relies on a panel of experts.[1][2][3][4] The experts answer questionnaires in two or more rounds. After each round, a facilitator provides an anonymous summary of the experts’ forecasts from the previous round as well as the reasons they provided for their judgments. Thus, experts are encouraged to revise their earlier answers in light of the replies of other members of their panel. It is believed that during this process the range of the answers will decrease and the group will converge towards the "correct" answer. Finally, the process is stopped after a pre-defined stop criterion (e.g. number of rounds, achievement of consensus, stability of results) and the mean or median scores of the final rounds determine the results.[5]

Delphi is based on the principle that forecasts (or decisions) from a structured group of individuals are more accurate than those from unstructured groups.[6] The technique can also be adapted for use in face-to-face meetings, and is then called mini-Delphi or Estimate-Talk-Estimate (ETE). Delphi has been widely used for business forecasting and has certain advantages over another structured forecasting approach, prediction markets.[7]

Tom Curtis said...

Further comments on Tol's opus.

With regard to independence, most criticism of Cook et al on that ground is based on a simple misreading of the claim of rater "independence". However, some (around ten or so) abstracts were explicitly discussed by raters in the internal forum. The initial rating of these abstracts was not "independent" described in Cook et al. This was a lapse in procedure and should not have happened. Arguably Cook et al should have excluded these abstracts from the final results, and should certainly have deleted the discussion and reinforce the requirement for independent ratings.

I do not think it is a significant lapse given that the stated procedure in the paper called for dispute resolution by, first, rerating by the initial raters with the other rating before them, and then adjudication by a third party. Those who disagree are quite welcome, however, to identify the ten or so abstracts involved - exclude them from the sample and recalculate the results. If they think it will make a difference to the results, they are delusional. If they mention the error without mentioning the scale of the problem, they are not interested in generating informed analysis, but merely in generating "talking points" to allow those who are discomfited by the results of Cook et al to dismiss the results without thought. Those in the later category deserve nothing but contempt.

Tom Curtis said...

The "further comments" in the previous post was actually intended to be a link to my most recent blog post:
http://bybrisbanewaters.blogspot.com.au/2013/06/tol-on-quantifying-consensus-on.html

It largely reproduces a comment still in moderation at Wotts Up With That Blog:
http://wottsupwiththatblog.wordpress.com/2013/06/02/watt-about-richard-tol/

Anonymous said...

There is a rather obvious response to all this brouhaha about the consensus. Or rather, there are a number of responses that could be pursued.

Firstly, the non-consensus claimants could trawl through the literature returned by Cook's et al search, and catalog those papers that they think are explicit non-endorsements of human caused climate change. Show the world which papers they believe offer evidence that human carbon emissions are not warming the planet, and numerically quantify this body. A rigorous cataloger would do this as an annotated bibliography, and demonstrate their own thinking about the content of the listed publications. Let's see exactly what is the quality (and quantity) of evidence against human-caused global warming, as gathered by a Cook et al style of search.

Secondly, the non-consensus claimants could trawl through the literature returned by Tol's proposed search, and catalog those papers that they think are explicit non-endorsements of human caused climate change. Which ones were excluded by Cook's et al search, and what is the relevance of the excluded papers to the claim that Cook et al biased the quantification of the consensus?

Then it becomes more interesting...

Thirdly, the papers returned in the above exercises could be assessed for their defensibility by following the Wos/Scopus links to all citing articles - papers that were subsequently and definitively refuted obviously have a credibility issue in any argument against a consensus. For balance the same analysis should be conducted on a sample of the papers returned in the search that explicitly endorse the human cause of current global warming. What do these analyses say about the science underpinning each side of the issue?

Fourthly, randomly selected subsets of papers from each of the two groups of papers defined in the preceding paragraph could be sent to a randomly-chosen selection of scientists in various disciplines, and these scientists asked to assess the merit of each paper in supporting or refuting the human cause of current global warming. For completeness the participating scientists could be asked to consider any subsequent response to the papers, whether in support or in refutation. With appropriate refinement such a survey would truly demonstrate what scientists think about the veracity of the work done in physics and climatology, and whether the science does actually demonstrate that humans are causing the planet to warm. After all, a true consensus should not be based merely on the numbers of papers that present 'for' and 'against' cases, but on those papers that withstand subsequent scrutiny - something largely absent from other surveys... (apologies for any recursive niggles that this might implant in people's minds).

Of course, as I and many others have said previously, the opinions of scientists (especially when offered outside their fields of expertise) don't change the science itself, or the laws of nature. However the results of such a survey would be further (clearer) evidence of the opinions of scientists, and it might help to more tightly profile the issues where understanding diverges with respect to the nature of the climatological work. If it can be empirically demonstrated that non-consensus opinions are based on flaws-in-understanding of one sort or another then non-consensus opinions become even less relevant than they are now.


Bernard J.

willard said...

Just added 10 more comments on Tol's second draft @nevaudit. Richard is 2/40 in his response rate.

Richard just published his method for those who care for such things:

> My data, graphs, and code on the con/dissensus project http://t.co/QWwRxbl0Od

https://twitter.com/RichardTol/status/342255918202892289

What's written on the draft keeps me busy enough as it is.

***

Thank you for the kind words, Tom. I'll wait a bit before submitting your general comments.

I like your thought experiment, Bernard. At the very least, it might make a nice piece of conceptual art. We might have the means to rate ALL THE ABSTRACTS.

And I mean ALL like in ALL.

RATE ALL THE ABSTRACTS!

http://memegenerator.net/instance/38438880

That's what #scopus does, as Richard says in his draft, since adding meta-data is a form of rating.


Jeffrey Davis said...

A sign reads "Bridge Out Ahead."

Heroes go against the consensus and speed up.

Anonymous said...

Willard.

:-)

Of course, that in turn planted a seed in my mind...

All your abstracts are belong to us.

Which opens the possibility of much recaptioning mirth. Now that could be a nice piece of conceptual climatological art...

A Nobel pin from melted-down internet gold for the best installation!


Bernard J.

[Captcha is playing Turing with "examestt new"]

willard said...

Yesterday, I asked Richard:

> @dana1981 & @RichardTol, 40th comment - can you confirm your email to Tom Curtis about the work's sampling UNDERESTIMATING ENDORSEMENT?

https://twitter.com/nevaudit/status/342269126313648129

A bit later, Richard confirmed:

> @nevaudit Confirm. Draft paper says same: "global" in "climate change" disproportionally removes papers in geosciences journals

https://twitter.com/RichardTol/status/342271328893685761

Since I did not recall having read anything about underestimating endorsement, I asked:

> @dana1981 & @RichardTol, 41th comment - I don't find UNDERESTIMATE ENDORSEMENT in 2nd draft. "Draft paper says same"? Birds are watching.

https://twitter.com/nevaudit/status/342278905408913408

He replied:

> @nevaudit my bad

https://twitter.com/RichardTol/status/342280090954125313

Not sure if this means the draft will be corrected or if the claim will be retracted.

Richard's response rate has improved and is now 4/41.

To understand why birds are watching, readers should be aware of Eli's tweet account and Richard's new avatar photo.

willard said...

Just posted comment #42 to #55, which mostly finishes my reveiw of p. 3:

@dana1981 & @RichardTol, 42nd comment - "that the paper ratings are different from the abstract ratings" does'nt say much - #understatement.

@dana1981 & @RichardTol, 43rd comment - no citation to "a number of authors": show the whole network of your sources expressing #concerns.

@dana1981 & @RichardTol, 44th comment - it might be prudent to mention you are one of "the authors", since this is one of your #concerns.

@dana1981 & @RichardTol, 45th comment - your (1) is unclear: which papers were rated once, which twice, why is the result unrepresentative?

@dana1981 & @RichardTol, 46th comment - have you tested for representative similarity? Plausible: conservativeness affects 50% of results.

@dana1981 & @RichardTol, 47th comment - your test of neutrality that follows seems to affect self-raters even more than raters. Bonus 50%?

@dana1981 & @RichardTol, 48th comment - "selected papers are not on climate change". Which paper could be on such a general subject?

@dana1981 & @RichardTol, 49th comment - saying "selected papers are not on climate change" may presume CC as the paper's selection criteria.

@dana1981 & @RichardTol, 50th comment - "impact papers should be rated as neutral" presumes that topicality implies endorsement. Why rate?

@dana1981 & @RichardTol, 51st comment - your semantical analysis rests on a strict maxim of quantity: authors shan't say more than needed.

@dana1981 & @RichardTol, 52nd comment - your semantical analysis can't help understand the self-ratings. You're doing armchair linguistics.

@dana1981 & @RichardTol, 53rd comment - if we're to apply your semantic criteria, counting keyword trends may suffice. Beware your wishes.

@dana1981 & @RichardTol, 54th comment - no justification for "can only rated as neutral". The data contradict this #armchair declaration.

@dana1981 & @RichardTol, 55th comment - your claim about the 34,6% "misrated" papers needs to be tested against the authors's self-ratings.

willard said...

Richard updated his Draft comment:

https://twitter.com/RichardTol/status/342915068461191168

Richard acknowledged my comments:

> @nevaudit I'm not ignoring you. I was still working on the results. Wordsmithing later.

https://twitter.com/RichardTol/status/342915325211332608

***

Comments #56-66 have been published:

@dana1981 & @RichardTol, 56th comment - more than 97% of my comments still apply. Response rate ca. 4/56. More than #wordsmitting needed.

@dana1981 & @RichardTol, 57th comment - the "her opinions [...] irrelevant" misconstrues "to endorse": endorsements are not authority claims

@dana1981 & @RichardTol, 58th comment - "papers P are irrelevant" entails "P should not have been rated", not "P must be rated as neutral".

@dana1981 & @RichardTol, 59th comment - your argument from relevance restricts "relevant" to "is a attribution study".

@dana1981 & @RichardTol, 60th comment - ergo, why not say "Cook & al 2013 should have analyzed attributions studies" and be done with it?

@dana1981 & @RichardTol, 61st comment - Have papers "that can only be rated as neutral" been rated as endorsements by authors themselves?

@dana1981 & @RichardTol, 62nd comment - not mentioning self-ratings when discussing how papers "can only be rated" is misleading at best.

@dana1981 & @RichardTol, 63rd comment - "of the misrated papers" attributes a bug to raters, when it may be a classification feature.

@dana1981 & @RichardTol, 64th comment - if we're to construe a consensus claim as a political speech act (p. 1), then to endorse may be too.

@dana1981 & @RichardTol, 65th comment - defining "to endorse" as "to support" may leads to an equivocation, see http://ourchangingclimate.wordpress.com/2013/05/17/consensus-behind-the-numbers/#comment-18958 …

@dana1981 & @RichardTol, 66th comment - if "to endorse" means "to accept", i.e. one sense of "to support", the "irrelevant" claim is moot.

Tom Curtis said...

1) In his third draft, Tol has moved his discussion of the subsidiary survey of rating (4) papers from the footnote and dropped his claim that "While the difference between 97% and 98% may be dismissed as insubstantial, it is indicative of the quality of manuscript preparation and review."

He still insists, however, that there is doubt as to whether the subsidiary survey found five of one thousand or forty of one thousand "uncertain" papers among those rated (4). This despite a public statement by a co-author that the number was five; a statement of which Tol was aware well before his third draft. His lack of clarity is, therefore, purely tactical rather than based on evidence. That is, he is unclear because he ignores evidence of which he is aware in order to retain an unjustified negative criticism in his comment.

2) Tol has now admitted in his third draft that the skewed sample of disciplines relative to a scopus search "introduces a bias against endorsement". He does not make the same admission regarding the WoS search even though based on the same data and logic; and even though he has made that admission in private correspondence.

This admission means that his claim of evidence of bias comes entirely from his unjustified claim that "impacts" and "mitigation" papers should not be rated.

Tom Curtis said...

More on Tol draft 3:
http://bybrisbanewaters.blogspot.com.au/2013/06/more-tol-gaffes.html

willard said...

Thanks, Tom. I've cited your post in today's tweeted comments. Note that Richard has not published a 4th version.

Here are the tweets, which apply to the third and fourth paragraphs of Richard's draft, on p. 4:

@dana1981 & @RichardTol, 82nd comment - why omit "The neutrality of the abstract ratings can also be tested in a different way" in 4th ver?

@dana1981 & @RichardTol, 81st comment - "it is clear that the ratings and the self-ratings are different" says little: what should we infer?

@dana1981 & @RichardTol, 80th comment - your (1) amounts to #questionbegging, using an analytical result not been properly discussed.

@dana1981 & @RichardTol, 79th comment - why #foretell the claim (1) about the #unrepresentativeness? If relevant, indicates structural bug.

@dana1981 & @RichardTol, 78th comment - how can the N of authors be "too small" to reach any &c, while stating *50%* of the FOUR ABSTRACTS?

@dana1981 & @RichardTol, 77th comment - how many authors came out publicly to state that their ratings were wrong? Number may add precision.

@dana1981 & @RichardTol, 76th comment - "their papers were rated wrong" begs the question: is that true? Cf. e.g. http://bybrisbanewaters.blogspot.com.au/2013/05/tols-gaffe.html …

@dana1981 & @RichardTol, 75th comment - "their papers were rated wrong" might #conflate ratings and self-ratings: clarify which ratings.

@dana1981 & @RichardTol, 74th comment - "a number of authors" fails to mention YOURSELF, a criticism you use elsewhere against the authors.

@dana1981 & @RichardTol, 73rd comment - "a number of authors" follows "only means are compared": what's up with the lack of citation?

@dana1981 & @RichardTol, 72nd comment - "only means are compared" only sidesteps what seems to me perhaps the most result of the paper.

@dana1981 & @RichardTol, 71st comment - which tests do you propose to validate ABSTRACTS' ratings by PAPERS self-ratings? #ChapterAndVerse.

@dana1981 & @RichardTol, 70th comment - instead of speculating about 4 ABSTRACTS, why not read them, v. Tom Curtis' http://bybrisbanewaters.blogspot.com.au/2013/06/more-tol-gaffes.html …

@dana1981 & @RichardTol, 69th comment - "an error rate of 50%" on 4 ABSTRACTS. Really? Pray tell more about #representativeness.

@dana1981 & @RichardTol, 68th comment - there's also "and perhaps reconciliation or third rating", which stacks the deck with a speculation.

@dana1981 & @RichardTol, 67th comment - para on "three duplicate records" in the 4th version adds "after double rating", which adds nothing.

willard said...

Erratum:

> Note that Richard has NOW published a 4th version.

Over there:

https://docs.google.com/file/d/0Bz17rNCpfuDNNXJTTjZYN2ExYTA/edit

guthrie said...

"It is a strange claim to make. Consensus or near-consensus is not a scientific argument. Indeed, the heroes in the history of science are those who SUCCESSFULLY challenged the prevailing consensus..." ~ Richard Tol"

There, that's better.

willard said...

The conversation seems to have moved over Wott's, which I think could be added to Eli's blogroll:

http://wottsupwiththatblog.wordpress.com/2013/06/10/richard-tols-fourth-draft


PS: Capcha is Cheney lockemi

Rattus Norvegicus said...

Tol goes whining to Willard Tony about his comment being rejected, for the very reasons we figured it would be.

Tony of course credits it some grand conspiracy involving the name of Dan Kammen's endowed chair. He is just a fount of wisdom...