Wednesday, February 06, 2013

On Priors, Bayesians and Frequentists

A dialog between a bunny and a philospher in which questions of current concern are asked or not asked, and answered or not.  The philosopher will, until the philosopher wishes be anonymous or not.

[Eli]  So every once and then, Eli gets serious, and asks some questions.  In this case about Bayesian statistics.  Andrew Gelman pointed out that

[Andrew Gelman] Twenty-five years ago or so, when I got into this biz, there were some serious anti-Bayesian attitudes floating around in mainstream statistics. Discussions in the journals sometimes devolved into debates of the form, “Bayesians: knaves or fools?”.

[Eli]  Eli thought the proper designation of Bayesians was the batshit crazy, but no never mind.  The questions revolve around something lower down in that post, and frankly, in a vague attempt not to out his bunnyship as an idiot, Socrates, Eli thought you might be a reasonable spirit to ask.   

[Socrates] Shoot.

[Eli] So here’s Andrew Gelman on Noah Smith:

[Andrew] Smith does get one thing wrong. He writes:

[Noah] When you have a bit of data, but not much, the Frequentist – at least, the classical type of hypothesis testing – basically just throws up its hands and says

[Frequentist] We don’t know.

[Noah] It provides no guidance one way or another as to how to proceed.

[Andrew] If only that were the case! Instead, hypothesis testing typically means that you do what’s necessary to get statistical significance, then you make a very strong claim that might make no sense at all. Statistically significant but stupid. Or, conversely, you slice the data up into little pieces so that no single piece is statistically significant, and then act as if the effect you’re studying is zero.

[Eli] Andy underlines another mistake by Noah, this time when he says:

[Noah] If I have a strong prior, and crappy data, in Bayesian I know exactly what to do; I stick with my priors. In Frequentist, nobody tells me what to do, but what I’ll probably do is weaken my prior based on the fact that I couldn’t find strong support for it.

[Andrew]  This isn’t quite right, for three reasons.

First, a Bayesian doesn’t need to stick with his or her priors, any more than any scientist needs to stick with his or her model. It’s fine—indeed, recommended—to abandon or alter a model that produces implications that don’t make sense (see my paper with Shalizi for a wordy discussion of this point).

Second, the parallelism between “prior” and “data” isn’t quite appropriate. You need a model to link your data to your parameters of interest. It’s a common (and unfortunate) practice in statistics to forget about this model, but of course it could be wrong too. Economists know about this, they do lots of specification checks.

Third, if you have weak data and your prior is informative, this does not imply that your prior should be weakened!

[Eli] Eli's take on all this is that starting with priors (from models/theories/other data sets) which are close to the data set under analysis will result in improved statistical estimates.  The (very old language here) surprisal, the difference between the prior and the posterior, will be small and one may be able to used it to extract meaningful dynamics from under the statistical noise.

[Ms. Rabett, looking into Eli’s eyes] Very meaningful dynamics indeed.

[Eli, keeping his cool] However, if the prior is awful, the result may actually diverge from the underlying statistical information in the data set, so with Bayes, you have to know the answer, or a good approximation to it to make progress, or, as Gelman points out

[Eli, using Andrew’s voice] If the prior is derived from previous work, the data set may be crap, in which case the use of the Bayesian statistics is to identify crap data.

[Eli] So how good is Eli's prior?

[Ms. Rabett]  And posterior, which I admire on occasion.

[Socrates] Gelman's post is brilliant.  I like his blog.  I also like Mayo's.  And not to mention yours.  What's your priors, again?

[Eli] Eli has been brought up on charges by many.  More or less something we used a lot of years ago, taking the prior from theory and applying it to measured data, to see what the theory missed.   Still like that approach.

[Socrates]  Oh, that.  Well, yeah.  Some call this post hoc data mining.  Some call it experimentation. I never understood the concept of post hoc.  Can we really check if econometrists are not peeking at their data before designing their models?

Perhaps Solomon would pronounce my judgement better than me:

[Solomon] Make sure your statistical inference is minimal and all will be well.

[Eli] Not if the theory is done before the experiment.

[Socrates] Hmmm. Some say that if you choose your model after you analyze your data, quite nasty things will happen to the data and you must throw it out. Replace data with brains and you get zombie stories:

[Zombified econometrist]  Must... get.. more... data.

[Eli]  Real science is messy, this is arguing for only doing things when you know the answer before you start.  Is statistics a tool or a means in itself, if it is a tool, why let it run your life?

[Socrates]  Because auditors request it, perhaps.  

[Eli]  They seriously lack rhythm and sound like Hell’s version of karaoke.  All noise, no music.

[Socrates]  Pithy.  Let’s envision this myth of an Hell like Dante’s, but with four circles of accusations, which I’m tempted to characterize via D&D allegiances:

[The Neutral] You're picking cherries with your post hoc method.

[The Chaotic] Your data is just a bunch of cherries anyway!

[The Lawful] You're not following a standard based on any official (e.g. statistical) authority.

[Socrates’ Avatar] You're not following your own standards.

[Socrates]  This sums up most of econometrical concerns, as far as I can see.  When valid, the last argument may be tough to dodge. Since this is my avatar talking through econometrists, I might be biased.

[Eli]  Well ok, you analyze the old data for your prior and then get new data.

The anti-Bayesian about that is that if your new data is wildly different from your old you got a load of splainin to do cause either the prior or the later data is screwed up.

Or you could split the fifty co authors into two groups, one who does the prior and the other who does the data gathering.

The equivalent would be to take the FAR as the prior for the SAR, etc.

[Socrates]  That could be a start, but how exactly do you find new very old proxies, Eli?  Historical data can be scarce.

[Eli]  The journals are full of them, it is an industry, with lots of folks out there digging up old logs, drilling new ones, inventing new tools of analysis and more.

Good solutions to these problems depend on using the right prior distribution, one that properly represents the uncertainty that you probably have about which inputs are relevant, how smooth the function is, how much noise there is in the observations, etc.  In other words you pretty much know the answer.

[Socrates] Easier said than done.  Let’s leave this aside. Since the last time Plato channeled me, Aristotle proved that providing evidence was more substantial.  I rather like this statement by Radford Neal in this presentation:

[Radford] The Bayesian approach takes modeling seriously. A Bayesian model includes a suitable prior distribution for model parameters. If the model/prior are chosen without regard for the actual situation, there is no justification for believing the results of Bayesian inference.

[Socrates] Just under it, there's also a note about the pragmatic compromises.  It's a rather neat intro, which even me can almost understand.  For better sound bites, there’s Cromwell’s rule:

[Dennis Lindsay] Leave a little probability for the moon being made of green cheese; it can be as small as 1 in a million, but have it there since otherwise an army of astronauts returning with samples of the said cheese will leave you unmoved.

[Eli]  Eli will give the points on that one.  No one ever got poor betting with cranks against green cheese or the ether.

[Socrates] The name was inspired by Oliver Cromwell’s address to the Church of Scotland:

[Oliver Cromwell] I beseech you, in the bowels of Christ, think it possible that you may be mistaken.

[Socrates] According to this rule, only logical impossibilities should have zero prior.  I believe this rule is in the spirit of your remark about proxies.

[Eli]  I think my point is that Bayesian statistics only works if you have an intelligent prior.  If the prior work is of Dunning Kruegar quality you are screwed.  You will know less after the analysis than before you started it.

[Socrates] More than that: you become affected by DK yourself, and you start to use the theorem to prove the existence of God.

I'll read Gelman's paper.  I feel I already did.  Oh, I just had this reminescence of asking a non-Bayesian philosopher king why he was not Bayesian, and he said:

[Philosopher King]  Beats me.  I just ain’t.  Methinks this is like sexuality.  I liked the first three pages of Gelman.  I agree with his claim about philosophical bayesianism being crap. 

[Socrates]  I'm paraphrasing, even if it looks like Philosopher King’s talking.  Socratic dialogs are a rhetorical trick to have multiple lines of argument.

While I was making you believe that Philosopher King was talking, I searched the Internet (which Plato anticipated in his Phaedo) and found this video lecture, by Michael... Jordan.  Clicking on the titles of the slides makes them appear.

It’s a slam dunk.

[Eli] In other words, if you have a good idea of the answer they can help you, but if not you need physics or biology or chemistry or meteorology.

[Socrates] You always do, but as soon as you put any of that into the prior you have to face the Erinyes.

[Eli] You’ve not told me much, Socrates.  What’s your final answer?

[Socrates] Do I look like a truth machine to you?  Please confer to Yoda:

[Yoda] The Proper Statistics you must use, Eli.  Within it everything is.

[Eli] Eli is but an humble bunny, oh Yoda, how shall he know what to do if Socrates does not tell him.

[Socrates] Us oracles consult for carrots, silly Rabett.

14 comments:

  1. I went through something like this about 20 years ago. Some clever bunny discovered that you could "enhance" electron energy loss spectra using Bayesian statistics. This was great because EELS has problems with resolution and getting high quality spectra is tricky.

    I even saw said bunny give a talk about it at a conference. It looked complicated but wonderful and we all rushed off home to try it out for ourselves.

    Maybe I pressed the wrong button, but it seemed I could get any result I wanted from my data. Delta functions and everything. I came to the conclusion that I could significantly improve the resolution of my spectra, but only if I knew the answer precisely before I started.

    As far as I know, nobody uses this method nowadays for EELS, so I was possibly not alone.

    Danger Mouse

    ReplyDelete
  2. If you are consistent, your prior will reflect your past evidence. Strong contradiction, in the sense of having strong evidence for A first, later for not A, is very rare. If that happens, either your prior does not reflect your past evidence, your model does not fit the data, data is faulty, or you have bad luck.

    As I have seen, the main reason to use bayesian methods is not to inject prior knowledge into the model, but more technical. Hierarchical models have internal "priors", and on some other models maximum likelihood solutions are singular while the posterior still makes a lot of sense. In some cases, the model family is fine tuned with a prior. (e.g., L1 vs L2 vs Cauchy vs spike slab in linear regression).

    Often the priors are noninformative, carrying no domain knowledge.

    And note that your model family is also a prior, and a strong one!

    Mighty or Fred

    ReplyDelete
  3. "Probably Wrong"
    -- by Horatio Algeranon

    Bayesians are frequently wrong
    When priors are unreasonably strong.
    Frequentists are oft unfazed
    When models are wrongly bayesed.

    ReplyDelete
  4. Michael Jordan might appreciate if you include calibration and coherence in your limerick, Horatio.

    See his slide after his explanation in decision-theoretic terms.

    ReplyDelete
  5. Kruger, and note several more recent papers (with various authors) extend the findings of the eponymous paper.

    E.g.
    http://www.sciencedirect.com/science/article/pii/S074959780700060X

    ReplyDelete
  6. Eli, you say, " Real science is messy, this is arguing for only doing things when you know the answer before you start. Is statistics a tool or a means in itself, if it is a tool, why let it run your life?"

    Might be messy Eli, but you still have to do it - and you ain't doing it. This post reminds me of Raiders of the Lost Ark, where in the center of an Indian village a knife wielding bandit confronts Indiana in a highly contorted visual display. Indiana pulls out his Colt and promptly disposes of the bandit. Bandit=Eli, Indiana=VS.

    ReplyDelete
  7. My my, this anonymouse is all by way of mouth and none by way of trousers...

    PS An Indian village in RoLA? You can't even get that right!

    ReplyDelete
  8. Andrew Gelman reports reviews for his article with Shalizi, handwaved in the dialog:

    http://andrewgelman.com/2013/02/philosophy-and-the-practice-of-bayesian-statistics-with-discussion/

    More backreading for Socrates!

    ReplyDelete
  9. Bill, would you prefer, ////Eli=Bandit, Indiana=Wm Briggs or Steve Jewson

    ReplyDelete
  10. The whole issue seems to be Open Access:

    http://onlinelibrary.wiley.com/doi/10.1111/bmsp.2013.66.issue-1/issuetoc

    So we can read Mayo's criticisms and the authors' rejoinder.

    ReplyDelete
  11. I read

    Deborah G. Mayo
    Statistical Science and Philosophy of Science: Where Do/Should They Meet in 2011 (and Beyond)?
    RMM Vol. 2, 2011, 79–102

    and

    Deborah G. Mayo
    Statistical Science and Philosophy of Science Part 2: Shallow versus Deep Explorations
    RMM Vol. 3, 2012, 71–107.

    Not impressed.

    Also read

    Bradley Efron
    A 250-year argument: Belief, behavior and the bootstrap
    Bull. AMS 50:1, Jan 2013, 129--146.

    Impressed.

    ReplyDelete
  12. Socrates long history as a proponent of burning glass SRM does not inspire confidence--

    There is rumor in Oxyrhinchus of a lost play of Aristophane, s in which the philosopher refuses to whitewash his house because he thinks the Athenian Heat Island Effect has spoiled weather reports from the Tower of the Winds.

    ReplyDelete
  13. 'frequantism is a figment of a quantum mind.', anonymous #743.

    ReplyDelete

Dear Anonymous,

UPDATE: The spambots got clever so the verification is back. Apologies

Some of the regulars here are having trouble telling the anonymice apart. Please add some distinguishing name to your comment such as Mickey, Minnie, Mighty, or Fred.

You can stretch the comment box for more space

The management.