Saturday night is date night Ms. Bunny calls, so Eli is turning the keys over to an evil bunny with nasty big pointy teeth, aka the Rabett of Caerbannog. He (well Eli was not going to get close enough to really tell, those nasty pointy teeth you know, and the lack of the Holy Hand Grenade of Antioch) spent his spare time learning Python to put the Wattsbuster package together. It displays global-average temperature estimates from
stations interactively selected by a user (on a global
map). The package (it is not yet totally user friendly) strips away the mystery behind the NASA/NOAA/CRU
global-temperature computations for non-technical folks. Being
able to *show* the non believers rather than just *telling* them seems to be
helpful... Instructions on downloading and installation can be found at the end of this post and a FAQ follows (btw this is Caerbannog's post)
Unlike many other aspects of climate-science (hockey-stick with principal components, regressions, etc.), the global-temperature anomaly, involving nothing more than simple averaging, is something that bunnies can explain to non-technical friends/relatives in a way that they actually *get it* (at least to some extent).
This seems to be the Watts-skeptic crowd's biggest vulnerability -- they've gone out on a limb by being consistently wrong about stuff that I can explain to high-school/junior-college-educated folks...
###### 1. How we can know what the earth's temperature is ########
Basically, you average together a whole bunch of measurements over the Earth's surface. But it should be made clear that the NASA/NOAA/CRU are more interested in quantifying how average surface temperatures have *changed* than in just calculating the "Earth's temperature". Look at the Y-axis of one of the NASA global-temperature plots. What you will see is a Y-axis range of on the order of -1 deg C to +1 deg C. Obviously, the Earth's temperature does not lie between -1 and +1 degrees. What the NASA plots show is how the average of the temperature measurements taken by GHCN stations has *changed* over time.
It should be emphasized that we aren't interested so much in calculating the Earth's absolute temperature as we are in estimating how it has *changed* over time. Here is a basic summary as to how to compute average temperature changes from GHCN data.
1) For each station and month, calculate the average temperature over the 1951-1980 period. i.e. for each station, average all the Jan temps over the 1951-1980 period to produce the January baseline average temp for that station. Do the same for the Feb, Mar, ... etc. temps. For any given month, stations with insufficient data to compute 1951-1980 baselines are thrown out (that will still leave you with many thousands of stations for every month of the year). What constitutes "sufficient data" is a judgement call -- I require at least 15 out of 30 years in the baseline period. NASA requires 20 (IIRC). Results are very insensitive to this (10, 15, 20, etc. years all produce very similar results). You will end up with a 2-d array of baseline average temperatures, indexed by GHCN station number and month.
2) For every temperature station, subtract that station's Jan baseline average from the Jan temperatures for all years (1880-present). Do the same for Feb, Mar, Apr, etc. These are the station monthly temperature *anomalies*. "Anomaly" is just a $10 word for the difference between a station's monthly temperature for any given year and that station's baseline average temperature for that month.
3) The crudest, most-dumbed-down procedure to compute global-average temperature anomalies is simply to average together the monthly anomalies for all stations for each year. This is a very crude technique that will still give you not-too-bad "ballpark" global average estimates.
4) The problem with (3) is that stations are not evenly distributed around the globe. (If stations were uniformly spaced all over the Earth, the method described in 3 would be ideal). With method (3), regions with dense station coverage (like the continental USA) would be over-weighted in the global average, while less densely-sampled regions (like the Amazon region, Antarctica, etc.) would be under-weighted.
To get around this problem, we divide up the Earth's surface into grid-cells. Then we compute the average temperature anomalies by applying (3) just to stations in each grid cell to produce a single average anomaly value per month/year for each grid-cell. Temperature values for all stations in each grid-cell get merged into a single average value for that grid-cell. Then we just average all the grid-cell values together for each year to produce the global average temperature anomalies. Note:
The surface areas of fixed lat/long grid-cells change with latitude, so you will need to scale your results by the grid-cell areas to avoid over-weighting high-latitude temperature stations. An alternate approach is to adjust the grid-cell longitude dimensions as you go N/S from the Equator to ensure that the grid-cell areas are approximately equal. The grid-cell areas won't be identical (because your grid-cell longitude sizes are limited to integer fractions of 360 degrees), but they'll be close enough to identical to give very good results. (i.e. good enough for "blog-science" work).
5) The gridding/averaging procedure in its most rudimentary, stripped-down form is quite simple But it still produces surprisingly good global-average results. The one pitfall of the above method is that if you use the NOAA/CRU standard 5deg x 5deg grid-cell size, you will end up with many more "empty" grid-cells in the Southern Hemisphere than in the Northern Hemisphere. This will cause the NH (where there has been more warming) to be over-weighted relative to the SH (where there has been less).
So if you don't compute interpolated values for the empty grid-cells, your warming estimates will be too high (i.e. higher than the NASA results). But you divide up the Earth into 20 deg x 20 deg (or so) grid-cells, you will get results amazingly close to the official NASA results without having to bother computing interpolated values for "empty" grid-cells (because you made the grid-cells big enough that none of them will be empty). It's a crude short-cut, but it's a lot less work than doing all the interpolations, and it still gives you pretty darned-good global-average results.
######## 2 What sort of games Watts has been playing ###############
The big problem with the Watts approach is that he and his followers have not bothered to perform any global-average temperature calculations to test the claims they've been making (i.e. claims about UHI, homogenization, "dropped stations", etc.)
IOW, they have failed to perform any serious data analysis work to back up their claims. Had they done so, they would have found what I have found, namely:
1) Raw and adjusted/homogenized station data produce very similar global average results.
2) Rural and urban station data produce nearly identical global-average results.
3) The "dropped stations" issue pushed by Watts is a complete non-issue. Compare results computed with all stations vs. just the stations still actively reporting data (separating them out is a very easy programming exercise), and you will get nearly identical results for all stations vs. just the stations that haven't been "dropped".
4) The GHCN temperature network is incredibly oversampled -- i.e. you can reproduce the warming trends computed by NASA/NOAA with raw data taken from just a **few dozen** stations scattered around the world. I was able to replicate the NASA long-term warming trend results very closely by crunching *raw* temperature data from as few as *32* rural stations scattered around the world.
####### 3 How real skeptics can beat him by reducing the data themselves ######
The best approach is to roll up their sleeves and compute their own global average temperature estimates for a bunch of different combinations of globally-scattered stations. Skeptics with sufficient programming experience should not have much trouble coding up their own gridding/averaging routines from scratch, in their favorite language. They can even use python -- it's amazingly fast for a "scripting" language. Skeptics without programming experience will just have to trust and run code written by others.
####### 4 Where to get that data ##########
NASA and NOAA have had long-standing policies of making all of the raw and adjusted temperature data they use freely available to the public.
The GHCN monthly data can be downloaded from here: ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/v3/
The GHCN daily data can be downloaded from here: ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/
Note: Unless you are a masochist who wants to write a lot mind-numbing low-level data handling code, just use the monthly data.
The whole ball of wax (including GHCN V3 data) can be downloaded from tinyurl.com/WattsBusterProject
Need to give a shout-out to the authors of the socket code and the QGIS software package -- most of the "heavy lifting" was done by those guys. (QGIS and python are *really* cool, BTW -- this project was my first serious intro to python). Also, here's a quick summary (minus a lot of cluttering details) of the installation/operation procedure
1) Make sure that gcc/g++, gnuplot, X11 and QGIS are installed.
(Easiest on a Linux Box)
2) Unpack the WattsBuster zip files
3) Launch QGIS.
On the top menu bar, go to Plugins->Fetch Python Plugins
Find and install the Closest Feature Finder plugin.
4) Shut down QGIS
5) Copy the closest_feature_finder.py file supplied in the WattsBuster package
to the appropriate QGIS plugin folder (typically ~/.qgis/python/plugins/ClosestFeatureFinder/)
6) In the WATTSBUSTER-1.0.d directory, build the anomaly.exe executable as follows:
make clean; make
7) Launch the executable per the supplied script:
8) Launch QQIS
Load the supplied QGIS project file (GHCNV3.qgs)
Start up the Closest Feature Finder plugin and connect to the anomaly.exe
9) Select a GHCN station layer and start ctrl-clicking on random stations.
This is not a polished end product; think of it as a "proof of concept" prototype..