On Priors, Bayesians and Frequentists
A dialog between a bunny and a philospher in which questions of current concern are asked or not asked, and answered or not. The philosopher will, until the philosopher wishes be anonymous or not.
[Eli]
So every once and then, Eli gets serious, and asks some questions. In
this case about Bayesian statistics. Andrew Gelman pointed out that
[Andrew Gelman]
Twenty-five years ago or so, when I got into this biz, there were some
serious anti-Bayesian attitudes floating around in mainstream
statistics. Discussions in the journals sometimes devolved into debates
of the form, “Bayesians: knaves or fools?”.
[Eli] Eli
thought the proper designation of Bayesians was the batshit crazy, but
no never mind. The questions revolve around something lower down in
that post, and frankly, in a vague attempt not to out his bunnyship as
an idiot, Socrates, Eli thought you might be a reasonable spirit to ask.
[Socrates] Shoot.
[Eli] So here’s Andrew Gelman on Noah Smith:
[Andrew] Smith does get one thing wrong. He writes:
[Noah]
When you have a bit of data, but not much, the Frequentist – at least, the
classical type of hypothesis testing – basically just throws up its
hands and says
[Frequentist] We don’t know.
[Noah] It provides no guidance one way or another as to how to proceed.
[Andrew] If
only that were the case! Instead, hypothesis testing typically means
that you do what’s necessary to get statistical significance, then you
make a very strong claim that might make no sense at all. Statistically
significant but stupid. Or, conversely, you slice the data up into
little pieces so that no single piece is statistically significant, and
then act as if the effect you’re studying is zero.
[Eli] Andy underlines another mistake by Noah, this time when he says:
[Noah] If
I have a strong prior, and crappy data, in Bayesian I know exactly what
to do; I stick with my priors. In Frequentist, nobody tells me what to
do, but what I’ll probably do is weaken my prior based on the fact that I
couldn’t find strong support for it.
[Andrew] This isn’t quite right, for three reasons.
First,
a Bayesian doesn’t need to stick with his or her priors, any more than
any scientist needs to stick with his or her model. It’s fine—indeed,
recommended—to abandon or alter a model that produces implications that
don’t make sense (see my paper with Shalizi for a wordy discussion of this point).
Second,
the parallelism between “prior” and “data” isn’t quite appropriate. You
need a model to link your data to your parameters of interest. It’s a
common (and unfortunate) practice in statistics to forget about this
model, but of course it could be wrong too. Economists know about this,
they do lots of specification checks.
Third, if you have weak data and your prior is informative, this does not imply that your prior should be weakened!
[Eli]
Eli's take on all this is that starting with priors (from
models/theories/other data sets) which are close to the data set under
analysis will result in improved statistical estimates. The (very old
language here) surprisal, the difference between the prior and the
posterior, will be small and one may be able to used it to extract
meaningful dynamics from under the statistical noise.
[Ms. Rabett, looking into Eli’s eyes] Very meaningful dynamics indeed.
[Eli, keeping his cool] However,
if the prior is awful, the result may actually diverge from the
underlying statistical information in the data set, so with Bayes, you
have to know the answer, or a good approximation to it to make progress,
or, as Gelman points out
[Eli, using Andrew’s voice]
If the prior is derived from previous work, the data set may be crap,
in which case the use of the Bayesian statistics is to identify crap
data.
[Eli] So how good is Eli's prior?
[Ms. Rabett] And posterior, which I admire on occasion.
[Socrates] Gelman's post is brilliant. I like his blog. I also like Mayo's. And not to mention yours. What's your priors, again?
[Eli]
Eli has been brought up on charges by many. More or less something we
used a lot of years ago, taking the prior from theory and applying it to
measured data, to see what the theory missed. Still like that
approach.
[Socrates] Oh, that. Well, yeah. Some call this post hoc data mining. Some call it experimentation. I never understood the concept of post hoc. Can we really check if econometrists are not peeking at their data before designing their models?
Perhaps Solomon would pronounce my judgement better than me:
[Solomon] Make sure your statistical inference is minimal and all will be well.
[Eli] Not if the theory is done before the experiment.
[Socrates]
Hmmm. Some say that if you choose your model after you analyze your
data, quite nasty things will happen to the data and you must throw it
out. Replace data with brains and you get zombie stories:
[Zombified econometrist] Must... get.. more... data.
[Eli] Real
science is messy, this is arguing for only doing things when you know
the answer before you start. Is statistics a tool or a means in itself,
if it is a tool, why let it run your life?
[Socrates] Because auditors request it, perhaps.
[Eli] They seriously lack rhythm and sound like Hell’s version of karaoke. All noise, no music.
[Socrates] Pithy. Let’s
envision this myth of an Hell like Dante’s, but with four circles of
accusations, which I’m tempted to characterize via D&D allegiances:
[The Neutral] You're picking cherries with your post hoc method.
[The Chaotic] Your data is just a bunch of cherries anyway!
[The Lawful] You're not following a standard based on any official (e.g. statistical) authority.
[Socrates’ Avatar] You're not following your own standards.
[Socrates]
This sums up most of econometrical concerns, as far as I can see. When
valid, the last argument may be tough to dodge. Since this is my avatar
talking through econometrists, I might be biased.
[Eli] Well ok, you analyze the old data for your prior and then get new data.
The
anti-Bayesian about that is that if your new data is wildly different
from your old you got a load of splainin to do cause either the prior or
the later data is screwed up.
Or you could split the fifty co authors into two groups, one who does the prior and the other who does the data gathering.
The equivalent would be to take the FAR as the prior for the SAR, etc.
[Socrates] That could be a start, but how exactly do you find new very old proxies, Eli? Historical data can be scarce.
[Eli]
The journals are full of them, it is an industry, with lots of folks
out there digging up old logs, drilling new ones, inventing new tools of
analysis and more.
Good
solutions to these problems depend on using the right prior
distribution, one that properly represents the uncertainty that you
probably have about which inputs are relevant, how smooth the function
is, how much noise there is in the observations, etc. In other words
you pretty much know the answer.
[Socrates]
Easier said than done. Let’s leave this aside. Since the last time
Plato channeled me, Aristotle proved that providing evidence was more substantial. I rather like this statement by Radford Neal in this presentation:
[Radford] The
Bayesian approach takes modeling seriously. A Bayesian model includes a
suitable prior distribution for model parameters. If the model/prior
are chosen without regard for the actual situation, there is no
justification for believing the results of Bayesian inference.
[Socrates]
Just under it, there's also a note about the pragmatic compromises.
It's a rather neat intro, which even me can almost understand. For
better sound bites, there’s Cromwell’s rule:
[Dennis Lindsay] Leave
a little probability for the moon being made of green cheese; it can be
as small as 1 in a million, but have it there since otherwise an army
of astronauts returning with samples of the said cheese will leave you
unmoved.
[Eli] Eli will give the points on that one. No one ever got poor betting with cranks against green cheese or the ether.
[Socrates] The name was inspired by Oliver Cromwell’s address to the Church of Scotland:
[Oliver Cromwell] I beseech you, in the bowels of Christ, think it possible that you may be mistaken.
[Socrates] According to this rule, only logical impossibilities should have zero prior. I believe this rule is in the spirit of your remark about proxies.
[Eli] I think my point is that Bayesian statistics only works if you have an intelligent prior. If the prior work is of Dunning Kruegar quality you are screwed. You will know less after the analysis than before you started it.
[Socrates] More than that: you become affected by DK yourself, and you start to use the theorem to prove the existence of God.
I'll
read Gelman's paper. I feel I already did. Oh, I just had this
reminescence of asking a non-Bayesian philosopher king why he was not
Bayesian, and he said:
[Philosopher King] Beats
me. I just ain’t. Methinks this is like sexuality. I liked the first
three pages of Gelman. I agree with his claim about philosophical
bayesianism being crap.
[Socrates]
I'm paraphrasing, even if it looks like Philosopher King’s talking.
Socratic dialogs are a rhetorical trick to have multiple lines of
argument.
While I was making you believe that Philosopher King was talking, I searched the Internet (which Plato anticipated in his Phaedo) and found this video lecture, by Michael... Jordan. Clicking on the titles of the slides makes them appear.
It’s a slam dunk.
[Eli] In
other words, if you have a good idea of the answer they can help you,
but if not you need physics or biology or chemistry or meteorology.
[Socrates] You always do, but as soon as you put any of that into the prior you have to face the Erinyes.
[Eli] You’ve not told me much, Socrates. What’s your final answer?
[Socrates] Do I look like a truth machine to you? Please confer to Yoda:
[Yoda] The Proper Statistics you must use, Eli. Within it everything is.
[Eli] Eli is but an humble bunny, oh Yoda, how shall he know what to do if Socrates does not tell him.
[Socrates] Us oracles consult for carrots, silly Rabett.
14 comments:
Bloody sock puppets!
I went through something like this about 20 years ago. Some clever bunny discovered that you could "enhance" electron energy loss spectra using Bayesian statistics. This was great because EELS has problems with resolution and getting high quality spectra is tricky.
I even saw said bunny give a talk about it at a conference. It looked complicated but wonderful and we all rushed off home to try it out for ourselves.
Maybe I pressed the wrong button, but it seemed I could get any result I wanted from my data. Delta functions and everything. I came to the conclusion that I could significantly improve the resolution of my spectra, but only if I knew the answer precisely before I started.
As far as I know, nobody uses this method nowadays for EELS, so I was possibly not alone.
Danger Mouse
If you are consistent, your prior will reflect your past evidence. Strong contradiction, in the sense of having strong evidence for A first, later for not A, is very rare. If that happens, either your prior does not reflect your past evidence, your model does not fit the data, data is faulty, or you have bad luck.
As I have seen, the main reason to use bayesian methods is not to inject prior knowledge into the model, but more technical. Hierarchical models have internal "priors", and on some other models maximum likelihood solutions are singular while the posterior still makes a lot of sense. In some cases, the model family is fine tuned with a prior. (e.g., L1 vs L2 vs Cauchy vs spike slab in linear regression).
Often the priors are noninformative, carrying no domain knowledge.
And note that your model family is also a prior, and a strong one!
Mighty or Fred
"Probably Wrong"
-- by Horatio Algeranon
Bayesians are frequently wrong
When priors are unreasonably strong.
Frequentists are oft unfazed
When models are wrongly bayesed.
Michael Jordan might appreciate if you include calibration and coherence in your limerick, Horatio.
See his slide after his explanation in decision-theoretic terms.
Kruger, and note several more recent papers (with various authors) extend the findings of the eponymous paper.
E.g.
http://www.sciencedirect.com/science/article/pii/S074959780700060X
Eli, you say, " Real science is messy, this is arguing for only doing things when you know the answer before you start. Is statistics a tool or a means in itself, if it is a tool, why let it run your life?"
Might be messy Eli, but you still have to do it - and you ain't doing it. This post reminds me of Raiders of the Lost Ark, where in the center of an Indian village a knife wielding bandit confronts Indiana in a highly contorted visual display. Indiana pulls out his Colt and promptly disposes of the bandit. Bandit=Eli, Indiana=VS.
My my, this anonymouse is all by way of mouth and none by way of trousers...
PS An Indian village in RoLA? You can't even get that right!
Andrew Gelman reports reviews for his article with Shalizi, handwaved in the dialog:
http://andrewgelman.com/2013/02/philosophy-and-the-practice-of-bayesian-statistics-with-discussion/
More backreading for Socrates!
Bill, would you prefer, ////Eli=Bandit, Indiana=Wm Briggs or Steve Jewson
The whole issue seems to be Open Access:
http://onlinelibrary.wiley.com/doi/10.1111/bmsp.2013.66.issue-1/issuetoc
So we can read Mayo's criticisms and the authors' rejoinder.
I read
Deborah G. Mayo
Statistical Science and Philosophy of Science: Where Do/Should They Meet in 2011 (and Beyond)?
RMM Vol. 2, 2011, 79–102
and
Deborah G. Mayo
Statistical Science and Philosophy of Science Part 2: Shallow versus Deep Explorations
RMM Vol. 3, 2012, 71–107.
Not impressed.
Also read
Bradley Efron
A 250-year argument: Belief, behavior and the bootstrap
Bull. AMS 50:1, Jan 2013, 129--146.
Impressed.
Socrates long history as a proponent of burning glass SRM does not inspire confidence--
There is rumor in Oxyrhinchus of a lost play of Aristophane, s in which the philosopher refuses to whitewash his house because he thinks the Athenian Heat Island Effect has spoiled weather reports from the Tower of the Winds.
'frequantism is a figment of a quantum mind.', anonymous #743.
Post a Comment