Friday, April 28, 2017

The Data Lies. The Crisis in Observational Science and the Virtue of Strong Theory

The problem with data fetishists is their choking down a daily flagon of numerical drivel without analyzing the brew.  One of the things that a good scientist knows is how to interrogate the numbers, not waterboard them.  Truth is that useful models improve flaky data and the statistical treatment thereof.

An introduction  for Eli was a talk that Drew Shindell gave, twenty maybe more years ago with a title that ran, "Which should your trust, the data or the models?" about global temperature data in the late 19th century.  The useful conclusion was trust neither, but use them together to produce understanding and improve both.  Yes theory can improve measurements and data.

A nice example is how NIST's acoustic thermometer can be used to define the thermodynamic temperature scale.  Starting with the theoretical result for the speed of sound in an ideal gas as a function of temperature (theory), a carefully built device to measure the same can be used to build a model of the response of platinum resistance thermometers as a function of temperature and then by applying the model PRTs can be used to more accurately calibrate other thermometers.

How about statistics, well most of what passes for statistical analysis these day is unconstrained, so it can wander off into never never land where never is stuff like thermodynamics and conservation laws.  Bart had a nice example of this when discussing the usual nonsense about how observed temperature anomaly data could be explained as a random walk

As you can see, the theory is valid: My weight has indeed remained between the blue lines. And for the next few years, my weight will be between 55 and 105 kg, irrespective of what I eat and how much I sport! After all, that would be deterministic, wouldn’t it? (i.e. my eating and other habits determining my weight)

Wow, if that’s the case, then I’ll stop my carrot juice diet right now and run to the corner store for a box of mars bars!! And I’ll cancel further consultations with my dietician. Energy balance… such nonsense. Never thought I’d be so happy with a root!
The other side of this is the replication crisis hitting the social sciences, most prominently psychology, well, also other stuff.  To disagree with the first link, unlike physical sciences psychology has no well established theoretical consensus against which nutso outcomes can be evaluated. Science is about coherence (a no on that as Alice’s Queen would say) consilience (baskets full of papers having nothing to do with each other but taken together mutually supporting) and consensus (everybunny with a clue agrees on climate change or at least 97%).

So the question really is what should a lagomorphs's prior be for statistical validity.  Clearly, if all you have is the data, the standard of proof for any assertion about the data has to be very high.  Wrong answers at low levels of proof are a reason that out on the edge physicists demand 5 sigma data before accepting that a new particle has been found, that's saying that there is 1 chance in 3.5 million that the discovery was in error if that standard is met. 

On the other hand, in the well established interior of a field, where there is a lot of supporting, consilient work, a whole bunch of basic theory and multiple data sets, 5 chances in 100 can do the job or even 10 in a hundred.  Of course 30 in 100 is pushing it.

Andrew Gelman has a useful set of criteria for priors (same holds for frequentist approaches).  Among his recommendations are for weakly informative priors that
should contain enough information to regularize: the idea is that the prior rules out unreasonable parameter values but is not so strong as to rule out values that might make sense
and those priors should be
Weakly informative rather than fully informative: the idea is that the loss in precision by making the prior a bit too weak (compared to the true population distribution of parameters or the current expert state of knowledge) is less serious than the gain in robustness by including parts of parameter space that might be relevant. It's been hard for us to formalize this idea.


Old_salt said...

You may be correct that a lack of theory weakens replication in the social sciences. However, the other problem is the search for a small signal among large amounts of noise. And, I think the reason that physicists use 5-sigma is because they can--large amounts of data. I remember reading an article in Physics Today (but can't find it in a quick search) where they mentioned 3 or 4 5-sigma conclusions that were shown to be wrong. The main problem is that they were doing statistics only on the machine observation, not on inherent bias in the experiment.

Anonymous said...

Nowadays 2-3 sigma is generally considered to be the standard criteria for a 'scientific controversy', in data driven science. Things only really get fun interesting when you have several competing theories. Then it's called a 'scientific war'. One without actual human casualties, fortunately. That's what makes science so fun and rewarding for those who are able to and then choose to enter into the fray.

I recently participated in the londaleite war and that had a fine and satisfactory ending. That was also a great lesson in the value of failed hypotheses.

Fernando Leanme said...

In 50 years we will have a really good data set, with decent satellite coverage and reanalysis products (which I hope will include geothermal heat flux into deep water).

EliRabett said...

And, as Eli said, the satellite data is flaky because there are many problems with the analysis. C'mon Fernado, GEAFB.

Layzej said...

Go eat a fruit bat?

Anonymous said...

It took me a while but I think I got it!

Give everyone an effing break?

Mal Adapted said...

GEAFB: "Give everyone a f**king break" works for me. The trouble with acronyms is that they are products of lossy compression.

EliRabett said...

E is for Eli
Brave and cute


"On the other hand, in the well established interior of a field, where there is a lot of supporting, consilient work, a whole bunch of basic theory and multiple data sets, 5 chances in 100 can do the job or even 10 in a hundred. Of course 30 in 100 is pushing it."

~<50 studies of CO2 doubling of climate sensitivity have been published in 120 years / 5 generations. The outliers are 9K apart , and the modern set is still > 2.5 K wide .

Brave Eli may need a spare rabbits foot to know where the final value is going.

Unfortunately, big bunnies are not necessarily lucky ones:

Sorry to again report The html acceptance rate from Safari is again < 5 sigma,so please ask Blogger to fix same.

Bryson said...

Of course for policy purposes risk and insurance are the operative concepts. Given the stakes, tgnoring the high end (where a lot of the risk lies) because one would rather bet on the low end seems irresponsible to me.

Jan Galkowski said...

Well. Of course. Eddington: ``It is also a good rule not to put overmuch confidence in the observational results that are put forward until they are confirmed by theory'' (from his book). On the other hand ...

It is also possible to score theory's consistency with experiment with techniques better than t-tests and the like, notably the important information criteria that have been developed (Burnham and Anderson). These are bidirectional. For example, it is entirely possible an observational experiment, however well constructed, might be useless for testing a model. Observational experiments are not as powerful in this regard as are constructed experiments.

But I think the put-down of the random walk as a model is a bit strong. After all, that is the basis of a Kalman filter-smoother, at least in the step-level change version. Sure, the state equation need not assume random variation and could have a deterministic core about which there is random variation. But it is possible to posit a ``null model'' if you will which involves no more than a random walk to initialize, and then takes advantage of Markov chains as universal models to lock onto and track whatever a phenomenon is.

(See figure.)

Better, it's possible to integrate over parameters, as was done in the bivariate response for temperature anomalies in the above, to estimate best fits for process variance. It's possible to use priors on these parameters, but the outcomes can be sensitive to initializations. It's also possible to use non-parametric smoothing splines fit using generalized cross-validation. These are a lot better than some of the multiple sets of linear fits I've seen done in Nature Climate Change and they tell the same story:

(See another figure.)

No doubt, there are serious questions about how pertinent these models are to paleoclimate calculations. However, if they are parameterized correctly, especially in the manner of hierarchical Bayesian models, these could well provide constraints in the way of priors for processes which could be applicable to paleoclimate.

While certainly theory can be used, and much of it is approachable and very accessible, I understand why people might want to do something else. Business and economic forecasts are often done using ARIMA models, even if these are not appropriate.

But there is an important area of quantitative research which offers so-called model-free techniques for understanding complex systems, and, in my opinion, these should not be casually dismissed. In particular, the best quantitative evidence of which I am aware teasing out the causal role CO2 has for forcing at all periods comes from this work. In fact, I'm surprised more people aren't aware of -- and use -- the methods Ye, Deyle, Sugihara, and the rest of their team offer.

EliRabett said...

Russ, who knew you were a luckwarmer. Pick anything in range and the only question is whether your kids or your grandkids are screwed.

Fernando Leanme said...

I just read an article describing how renewables are cheaper than fossil fuels. This seems to be a strongly held belief in the green community. So...if it's true this whole debate is a waste of time, you can put away the polar bear suits, and we can debate issues like the future robot versus cyborg wars.

Jan Galkowski said...

@Fernando Leanme,

Well, certain ones, like land-based wind turbines, are (see Lazard), but that does not make it a slam dunk win, not by a long shot. Even though land-based wind is cheaper than even natural gas generation, people fight it tooth-'n'-nail, e.g., in Falmouth, MA, and fear of effects upon property values is the suspicion why, although as far as I know that has not been studied. This opposition was why even near shore projects like Cape Wind were doomed, supported by fossil fuel interests.

In fact, to the degree to which a grid needs to be modernized to accommodate these sources means capital investments, and the corporations responsible are loathe to do that. Moreover, adoption is slow, even if, with subsidies, installing self-owned solar panels on homes with the proper insolation is for the homeowner a profit center.

This will take quite some time. Then you need to decarbonize transport.

The point is, if Mr Market is allowed to control the rollout of decarbonization, monopoly interests will use everything they can to delay it, and then we all will probably encounter a Minsky moment when Mr Market suddenly figures out the way things have been done for a century isn't going to work any longer. Mr Market will adjust, but catastrophically. And, during the interim, we are, according to the people who've studied this, committing to more and more essentially irreversible climate disruption, where ``irreversible'' means it'll take at least 1000 years to rectify, and, in the case of ocean warming and sea-level rise, something more on the order of 10,000 to 20,000 years.

But, then, policymakers and the public don't get that and are worried about how their coastal property values will fare because wind turbines are erected.

EliRabett said...


David B. Benson said...

Well stated.


No, Eli, I'm not a luckwarmer, I just don't know which generation besides ours to blame for the future.

Pick anything in range and the only question is whether your kids or your grandkids will up and summon the courage do something about the Earth's radiative quilibrium other than abandon fire.

CO2 is not the only climateball in play.

David B. Benson said...

The paper linked in
offers a statistical causality argument that CO2 causes global warming in historical times. I encourage all to read it.


Let us encourge David to note that those emigrating to the .Sunbelt in order to experience what he fears for posterity generally live loner in consequece, thanks to their well- powered air conditioning

EliRabett said...

Yep, they leave their families behind and that's what happens


I somehow imagineer that between perovskite porte cochere roofs and the 20 kW Tesla batteries parked beneath them, that life will go on in the Florida polders.

CRISPR enthusiasts are invited to work on tilapia that prefer a diet of sugarcane and python hatchlings.

EliRabett said...

Yep, you used to be able to go up to the hills in North Carolina to escape Florida summers.

Anonymous said...


thanks for the tweet linking to this. Won't link to the paper that made me tweet about bad stats in climatology because it's by some well-known and respected researchers, but they care too much about winning the climate wars. They are not on their own, either.

I don't agree with the wiki you linked to on priors. These are fine for exploring data, but no good for testing conclusions. Recommend Deborah Mayo's stuff on severe testing - she blogs at She has some interesting exchanges with Gelman.

Consensus can sometimes be an issue if an idea is passed on without being severely tested. A paper in 2015 claimed that ghg forcing of the climate system could not produce a step change - physically implausible - because the authors were thinking of a direct link between forcing and warming and overlooking the dynamics of the ocean-atmosphere system. It was cited in another paper in 2016 as a 'proof'. A further paper with authors from both papers then cited both 2015 and 2016 papers as established fact. They were also cited in a Nature paper last week as the same. From supposition to fact in two years.