Thursday, June 07, 2007

Twitchy whiskers

Dano has pointed out the folks over at Climate Audit are undergoing mentalpause, gone all twitchy about the bunny they have. Even hauled out the anyone who talks to himself in the third person is a nutjob line. They obviously failed imagination and literature, but keep the Sitemeter spinning.

Now in general Eli approves of such behavior at the beginning of the summer. Eli and Ms. Rabett have hied off to the beach to look at the bunnies, the sea and the shopping, all low brain function activities. We wish you and yours much the same.

However, in the middle of this Hans Erren asked a thought provoking question over here

Given the fact that the updated Labrijn series for De Bilt was already available since 1995, don't you think that somewhere down the line GHCN and GISS did a very sloppy job with their homogenisation adjustment QC?
Eli provided a simple answer
Eli is seriously at the beach, but within those constraints, the GISS adjustments are based on a method they apply across the board, so they probably prefer to be uniform. Don't know enough about the GHCN homogenisation adjustments.
Which is both right and incomplete after some more investigations. This being later at night, nothing on the tube, and too late to hop in the car and find a beer, with the sweet lassitude of summer nights upon us, the Rabett hied off to CA to shake up the blood, and ran into the same thing amidst the sea of bile. Ethon says that there must be liver there with so much bile and went off with his straw. Even beach places have T1 these days, soafter a bit of poking about, it became clear that these different homogenizations were optimized for and are best used for different things.

A good place to start is Hansen, Sato, et al. from 1999 explaining how they combine records at any location to obtain a single record.
The single record that we obtain for a given location is used in our analyses of regional and global temperature change. This single record is not necessarily appropriate for local studies, and we recommend that users interested in a local analysis return to the raw GHCN data and examine all of the individual records for that location, if more than one is available. Our rationale for combining the records at a given location is principally that it yields longer records. Long records are particularly effective in our “reference station” analysis of regional and global temperature change, which employs a weighted combination of all stations located with 1200 km as described below.
For urban stations, they apply a homogeneity adjustment
An adjusted urban record is defined only if there are at least three rural neighbors for at least two thirds of the period being adjusted. All rural stations within 1000 km are used to calculate the adjustment, with a weight that decreases linearly to zero at distance 1000 km. The function of the urban adjustment is to allow the local urban measurements to define short-term variations of the adjusted temperature while rural neighbors define the long-term change.
Contrast this with the method currently used at de Bilt (can't find the Engelen and Nellestijn article) in Brandsma, T., G.P. Können en H.R.A. Wessels, Empirical estimation of the effect of urban heat advection on the temperature series of De Bilt (The Netherlands), Int. J. Climatology, 2003, 23, 829-845
The influence of urban heat advection on the temperature time series of the Dutch GCOS station De Bilt has been studied empirically by comparing the hourly meteorological observations (1993-2000) with those of the nearby (7.5 km) rural station at Soesterberg. Station De Bilt is in the transition zone (TZ) between the urban and rural area, being surrounded by three towns, Utrecht, De Bilt and Zeist. The dependence of the hourly temperature differences between De Bilt and Soesterberg on wind direction has been examined as a function of season, day- and night-time hours and cloud amount. Strong dependence on wind direction was apparent for clear nights, with the greatest effects (up to 1 °C on average) for wind coming from the towns. The magnitude of the effect decreased with increasing cloudiness. The analysis suggests that most of the structure in the wind direction dependence is caused by urban heat advection to the measuring site in De Bilt. The urban heat advection is studied in more detail with an additive statistical model. Because the urban areas around the site expanded in the past century, urban heat advection trends contaminate the long-term trends in the temperature series (1897-present) of De Bilt. Based on the present work, we estimate that this effect may have raised the annual mean temperatures of De Bilt by 0.10 ± 0.06 °C during the 20th century, being almost the full value of the present-day urban heat advection. The 0.10 ± 0.06 °C rise due to urban heat advection corresponds to about 10% of the observed temperature rise of about 1.0 °C in the last century.
Where they carefully concentrate upon a single station, and a paired rural site. This study attempts to optimize the correction (and thus the record) for a single station. The correction is based on one very local comparison. Which is best? Well what are you trying to do? Obtain the optimal reconstruction for the de Bilt site, or the best reconstruction on a global scale?

Even in the latter cases there are different methods, each of which arguably can be useful. We see that with the USHCN data set. RTFR


Anonymous said...

The best reconstruction on a global scale is not obtained by degrading high grade observations.

You start with a network analysis of a small subset of well kept stations, and use this primary network to densify the data by comparing other stations, rigorously weeding out substandard locations, at least that is how spirit leveling is performed in the Netherlands.

The problem is that you can't say anything about remote stations like spitzbergen where there is no nearest neighbourhood control on station moves.

Using kriging, an indication can be obtained about the estimated error, which of course is largest for stations like spitzbergen, and minimal for De Bilt.

But hey, I'm just a geophysicist who has been QC-ing crappy data for 20 years, why would you listen to me?

Anonymous said...

What you say makes perfect sense to me - but boy it takes a lot of care to correct the jumble of historical data. If we were looking at huge effects relative to the precision of the original measurements then there might be some justification for averaging out all the errors, but the level of precision being argued about makes your approach the only reasonable one. Fewer but better QC data points will shed more light than heat, STS.

EliRabett said...

If the records were perfect, there would be no argument that there are unique best local records and a unique best global one. The problem is that the records are imperfect. Emphasizing local information may result in noiser data from each local record fed into the global one and degrade the latter. This is not a question with a unique answer.

Anonymous said...

Emphasizing local information may result in noiser data from each local record fed into the global one and degrade the latter.

Only when you don't QC your data. Due to high near distance (1000 km) spatial correlation, station moves are easily recognised. See eg Paris minus Uccle:

That's the power of network iterations. Believe me, I've tied magnetic Africa and South Asia together (and the clients were satisfied).

Anonymous said...

"The best reconstruction on a global scale is not obtained by degrading high grade observations."

In general that is true, but -- notwithstanding obvious problems like nearby incinerators and the like -- it is not always obvious which stations are of the "highest grade" and which are doing the degrading.

Selecting out the "best" stations is not an entirely objective process and could actually introduce bias and other problems into the process.

It is quite possible that a station that appears to be very good might have some underlying problem (that may not even be recognized as such) that actually skews the collective data more than a station that had an "obvious" problem and was therefore deleted.

Also, a station that is below average in some regards may be above average in other regards and the latter may more than offset the former with respect to the overall quality of the data.

That is why it is true that (all other things being equal) statistically speaking, the larger the number of samples, the better.

It could actually be better to have a larger dataset that includes what are perceived to be "non-optimal" stations than a very restricted data-set consisting only of what are perceived to be "optimal" stations.


Anonymous said...

I disagree, take this example from Berlin, three independent stations with long records within 20 km from each other:
The main differences occur from sensor changes and it is immediately evident, which is the odd one out: Tempel in 1909, Dahlem in 1933 and Potsdam in 1937. Only after jump adjustment a proper analysis of UHI can be started, mixing non-homogenised data leads to severe degradation.

1950 is a severe problem, as the majority of stations underwent changes due to changing WMO standards, it takes local knowledge and complete metadata to tackle this problem, The present GISS algorithm definetely is not suitable for the job it claims to have completed.

Anonymous said...

So, you are saying that it is "always obvious which stations are of the "highest grade" and which are doing the degrading"?

Anonymous said...

It's interesting to follow the arguments regarding "best stations" vs larger sample number. In the realm of clinical trials variability of the sample population is a given. An investigator can set inclusion and exclusion criteria before a study starts, but it's tough to throw out data from patients because an investigator does not like the data look. Soooo, the better way is to increase N.

EliRabett said...

I would go further than Deech, and say that a consistent treatment of station data is the best way to go.

A futher point which appears to be missing, is that although the number of stations has decreased in the last 20 years, more stations are automated. This provides higher quality data from those stations.

Anonymous said...

So with higher quality, we are getting less variation - I assume the means would show a continuity rather than a sudden jump.

As I understand it (and please correct me if I'm wrong), the number crunching reveals temperature anomalies, so each station would serve as its own control (kind of like patients, who receive some kind of baseline evaluation/data collection before study start).

I hope this isn't going too far afield, but I am more involved in medical research - talk about variability and complexity! Yet, a lot of information is gathered and seemingly accepted (with the possible exception of epidemiology studies involving tobacco and war dead) by those who apply a most critical eye to climate data.

Dano said...

On the ground, however, the conditions cannot be controlled as in a lab. The station is in the open. Old temp measurements were indeed made with lower quality instruments and by people who may have been grumpy on a given day (but the sheer number of obs cancels that out, but I used to complain about the reliability of our dew points).



Anonymous said...

Dano, you are absolutely right about being able to control conditions in a lab, but individual variability is still a problem (as I notice when performing pharmacokinetic studies). Each living creature is complex (even inbred strains of mice show individual variability) and when it comes to patient data even more variability is encountered.

My points are that 1) variability is not unique to climate science, and 2) science can progress even in the face of variability, and IMHO climate science is unfairly held up to a higher standard by those who don't like the data. How to overcome variability? One way, as you mentioned, is having a high number of observations. Another is having an internal control, such as measuring changes in a given parameter.

But isn't the attempt to cast doubt on the ground data in the face of the independent confirmation via satellites and observation (uh...frozen stuff is melting) a little odd? Anyway, sorry for going a bit OT. It is a product of my OCD and short attention span. :)

Anonymous said...

With all due respect it is somewhat more complex than you surmise.
The situation is a bit like trying to analyze data from a multi-year clinical trial across multiple locations where there is evidence that the protocol has not been consistently followed in all locations and for all patients within a location. At a certain point greater N exacerbates the problem as more and more extraneous and indeterminate variables enter the equation.

Also remember many skeptics are asking for the protocol and complete patient by patient data to replicate the claims that have been made. These are not outrageous requests given the claims - but are apparently treated as such.

Anonymous said...

"At a certain point greater N exacerbates the problem as more and more extraneous and indeterminate variables enter the equation."

I had not realized that there was a point beyond which statistics no longer applied.

I was under the impression that there were valid statistical reasons for keeping a relatively large number of samples to deal with the possibility of random variables.


Anonymous said...

Isn't there already an effort by Tom Karl to identify a relatively small sample of high quality weather stations rather than relying on statistical techniques to isolate trends in a much larger number of weather stations where the quality of the data records is questionable? Clearly all other things being equal, larger samples are better - but all other things aren't always equal and that is why you try to carefully define your population.

EliRabett said...

The USHCN was a selection of the best stations out of the Co-op network. Karl is doing something a bit different. He is setting up a number of stations to act as a calibration for the USHCN stations. These will be used to validate the data from the USHCN stations.

Anonymous said...

THe statis method only works fine for big numbers. The problem is that for pre-1950 data the number of available stations rapidly dimishes, so an iterative triangulation method is the best method left.

Anyway, the GISS method of adjusting step inhomogeneities with a ramp, thereby adding a warming trend to rural sites, is fundamentaly wrong in all cases.

Anonymous said...

Can you say more about the GISS method as you understand it?

Anonymous said...

Perhaps the climate scientists should just "stand back" and let the real expert --Hans, who obviously has perfect knowledge and no need for statistics -- decide which stations to keep and which to jettison.

After all, he's "a geophysicist who has been QC-ing crappy data for 20 years".

Wow! I am just so impressed. :-)

I'm curious, how many papers on climate science have you published Hans? In what journals?


Anonymous said...

You don't have to be a composer to judge if the music is out of tune.
The "experts" of GISS don't read the literature.

R. Sneyers, 1990, On the statistical analysis of series of observations, Technical note World Meteorological Organization no. 143, Geneva, World Meteorological Organization, 192 pp ISBN 92-63-10415-8

A van Engelen and Nellestijn, JW, 1996, Monthly, seasonal and annual means of air temperature in tenths of centigrades in De Bilt, Netherlands, 1706-1995. KNMI report from the Climatological Services Branch

Compare the local experts with the amateurs at GISS

Anonymous said...

Ross McKitrick's at it again:

What baloney, eh?

-Stephen Berg

Anonymous said...

I don't think the home made graphs of Hans have anything to do with 'experts'.
If you compare the adjusted GISS data for De Bilt with the homogenised KNMI series, although there are differences, they both show a warming trend of about 0,1 °C per decennium for the last century . And they both show a increased warming over the last decennia.
If GISS showed a warming, and the adjusted De Bilt showed no warming or a cooling, yes, there would be a problem.
But the trend over the last century is about the same, and the trend over the last decennia is almost exactly the same.
You can argue about the details, but you can't argue about the trends. Yes, De Bilt is warming, and this warming is increasing the last decennia. No doubt about that.

Anonymous said...


Those graphs that you linked to mean nothing in and of themselves. The correction by itself is meaningless.

One really has to see the full history of the station and the actual temperature data along with the correction -- as well as data for nearby stations -- to make a decision about whose correction is closest to reality.

Also, without accessing and processing the data myself, I have no idea whether what you showed as corrections for GISS and others are even correct.

That is the whole purpose of publishing in peer-reviewed journals, so that people who know about this stuff can act as a reality check or "quality control", if you will.

Which gets to the question I asked above above about published papers -- which you did not answer.

You are quick to criticize those at NASA -- calling them amateurs -- but at least they publish in the peer-reviewed journals.

It's easy to make unsubtsantiated claims about the work of others, but much harder to put your money where your mouth is. At least they have the guts to put their stuff out there for criticism.


Anonymous said...

You are quick to criticize those at NASA -- calling them amateurs -- but at least they publish in the peer-reviewed journals.

You are aware that the maths are seldom, if ever, checked in a peer review? You are also aware that seldcom, if ever, the source code is reviewed?

Peer review really meant a lot many years ago, but anymore it's a rubber stamp by the guy down the hall, rather than a challenge from an unknown expert across the globe.

As for the quality of the data being discussed in this thread. The root problem is that we are trying to detect a shift in mean where the shift is about 1/10 the magnitude of our measurement resolution. Normally, that can indeed be accomplished IF you have lots of data to grind on because the biases will all cancel out. However, if X% of your data sources have an upwards bias due to site issues, then you have a real problem.

Anonymous said...

Peer review really meant a lot many years ago, but anymore it's a rubber stamp by the guy down the hall, rather than a challenge from an unknown expert across the globe.

With all due respect: huh? Are you claiming to know the identities of the reviewers of recent papers in climatology and their relationships to the authors? And that there have been changes in the last, oh, 30 years? And that climatology is any different from other sciences, such as radiation biology (a field in which I have some familiarity)?

Anonymous said...

one anonymous said:
Also, without accessing and processing the data myself, I have no idea whether what you showed as corrections for GISS and others are even correct.

Well I suggest you do your homework first before ventilating your unfounded criticism.

GISS adjusts a documented station step inhomogeneity with a ramp.

Anonymous said...

Unfounded criticism?

What a hoot.

Who is calling NASA GISS scientists amateurs?

You might at least provide some links to the NASA corrections, Hans.

You may have a lot of time on your hands, but some of us don't.

Anonymous said...

BTW, you still have not answered my question about published papers.

It seems that you are avoiding it.

Perhaps you are right and NASA is wrong, but if it is really all as obvious as you say that NASA scientists have no idea what they are doing, you should have no problem getting your results published.

Or perhaps there is just a conspiracy to keep those who disagree from getting published.

Anonymous said...

"if X% of your data sources have an upwards bias due to site issues, then you have a real problem."

Perhaps. Perhaps not. That depends on what X is. It also depends on whether another X% have a downward bias of a similar magnitude that cancels out those with the upward bias (ie, if the bias is randomly distributed).

You claim that there is an upward bias problem at a couple stations. Assuming for the moment that this claim is correct, are we really supposed to conclude from this that a similar upward bias applies at a significant fraction of all stations?

There seems to be more than a little presumption involved here.
Where is your proof?

Without proof, that conclusion is as unfounded as concluding a significant fraction of stations have supposed barbecue grill problems based on photographs of just a few stations. (How do we even know that the BBQ grill in question was even being used next to that temperature station, by the way?)

Incidentally, I am not at all impressed with the experiments that Watts is doing with the latex vs whitewash coatings. Talk about amateurish. He seems to have no clue what he is doing.

Anonymous said...

So, where are your published papers on climate science, Hans?

Einstein published his papers, you know.

Anonymous said...

7:14 Anonymous:
"LaLaLa I can't hear you."
Look at this graph then: GISS vs. local professional experts and then shut up.


Unknown said...

Poor, poor Hans.

Or should I call you "Albert"?

Jeff Harvey is a scientist.

Anonymous said...

Jeff Harvey certainly knows a lot about cabbages and insects. He doesn't have a clue "why the sea is boiling hot"

Jeff Harvey is not a geophysicist.