Thursday, July 10, 2014

Eli is on vacation




Eli is on vacation, a little surfing now and again.  Back eventually, till then there is this to ponder

Statistical mathterbation has broken out everywhere, Beenstock is back, Force X is out there, and Eli was reading Andrew Gelman who posted a useful comment from George Box on models
It is widely recognized that the advancement of learning does not proceed by conjecture alone, nor by observation alone, but by an iteration involving both. Certainly, scientific investigation proceeds by such iteration. Examination of empirical data inspires a tentative explanation which, when further exposed to reality, may lead to its modification. . . .
Now, since scientific advance, to which all statisticians must accommodate, takes place by the alternation of two different kinds of reasoning, we would expect also that two different kinds of inferential process would be re- quired to put it into effect.
The first, used in estimating parameters from data conditional on the truth of some tentative model, is appropriately called Estimation. The second, used in checking whether, in the light of the data, any model of the kind proposed is plausible, has been aptly named by Cuthbert Daniel Criticism.
In the comments, Corey refers to a paper which purports to show that Kepler's model was a worse fit to the data than the Ptolemaic model, however that paper had some problems
In brief, Spanos shows that the residuals of the Keplerian model fit to Kepler’s original n = 28 data set are indistinguishable from white noise, while the residuals of the Ptolemaic model fit to a data set of one Martian year (~2 Earth years) of *daily observations from the US Navy Observatory* (n = 687) show unmistakable autocorrelation. I don’t mind telling you that my jaw literally dropped when I realized that Spanos was checking the statistical adequacy of the two models on *two different data sets*.
 This, however, to Eli was unimportant, because physics, chemistry and increasingly biology are built upon the principle of parsimony, and this is something that need be made much more explicit in teaching science at all levels.  Realizing this, the epicycles were roadkill.  Kuhn, Popper and the rest never really came to terms with the two bedrocks of science, parsimony and consistency to understand the world.

The developments of the last thirty years have provided such models for biology and climate science, but the stamp collectors have not caught up.  Cladistics is useful when simplicity is lacking.  Pattern recognition can be powerful, but it also masks understanding.  Neural nets have no sense of guilt.

Enjoy

28 comments:

THE CLIMATE WARS said...

Neural nets have no sense of guilt
-- E. Rabett

Neither in my experience do some science historians.

I am however so impressed by Eli's logic that I will spend the remainder of the summer surfing as well.

William M. Connolley said...

> Realizing this, the epicycles were roadkill

Presentism. It just wasn't like that, I think: http://thonyc.wordpress.com/2014/07/03/planetary-tables-and-heliocentricity-a-rough-guide/

THE CLIMATE WARS said...

William, as surely as some paranoids have enemies, some forms of Presentism have a Present

Thomas Lee Elifritz said...

Thank you, Eli.

Dan said...

Spanos (2007) does not purport to show that Kepler's model was a worse fit than Ptolemy's--in fact, the Keplerian fit to the n=28 dataset gives a marginally better R**2 than the Ptolemaic fit to modern data (0.999 vs. 0.992). Spanos's point is that an adequate R**2 is a necessary but not sufficient condition for statistical adequacy--one has to look more carefully (e.g., at the residuals) to see that Ptolemy's is a worse model of the data.

I can't say for sure why Spanos used two different datasets. I would expect that fitting the modern dataset would be a much more stringent test (more data, more precise and more accurate) than Kepler's original dataset, which adds to the argument that R**2 is insufficient (the modern dataset is also better for demonstrating that the Ptolemaic fit gives correlated residuals).

Spanos's conclusion that "[t]hese error-statistical assertions are affirmed by demonstrating that Kepler’s law of planetary motion gives rise to a statistically adequate model, but Ptolemy’s epicycles model does not" is, I think, adequately supported by his argument. The use of two different datasets is only really a problem for people who haven't understood Spanos's argument.

EliRabett said...

Dan, go follow Wm's link as to why Spanos was FOS. Besides, of course the blodged comparison. This, of course, captures perfectly why statistics in the hands of statisticians should be taken with a load of salt.

Dan said...

I followed Wm's link, I don't see how it shows Spanos was FOS. Spanos's paper isn't an historical treatment, it makes a simple (and, IMO, unexceptionable) point about the modern usage of mathematical models and the over-reliance on R**2 and other simplistic measures of goodness-of-fit, using orbital mechanics as a simple well-known example. The allegedly "blodged comparison" is fine for the point he is making.

AFAICT, "Kepler's model was a worse fit to the data than the Ptolemaic model" is an incorrect representation of what the paper actually says (and is certainly beside the point of the paper). Usually Rabett Run is pretty reliable about accuracy and RTFP, but I think you missed on this one. Have you looked at the paper?

a_ray_in_dilbert_space said...

I think the point that both Spanos and Eli are trying to make is that simple agreement with observations is not enough. Mere goodness of fit leads to a "just so story," arbitrarily complicated. The real goal is predictive power. To that end efficiency in describing the data is crucial, and we now have metrics much better suited to that task. These include Information Criteria, such as AIC, BIC and DIC. By those metrics, Copernicus kicks Ptolemy's ass.

The Very Reverend Jebediah Hypotenuse said...

This:
"
It is important to emphasize that historically this law was originally proposed by Kepler as just an empirical regularity that he ‘deduced’ from Brahe’s data. Newton provided a structural interpretation to Kepler’s first
law using his law of universal gravitation...
"
is a serious historical mis-reading of Kepler's 'Astronomia Nova' - one that is unfortunately common in astronomy texts and even some history of science texts.

Kepler proposed a 'motive force' emanating from the sun, and decreasing in effectiveness with distance. This lead him to the so-called area law (now called K's 2nd law, even though it was proposed before the 1st law). It was only with the area law that Kepler was able to infer the elliptical orbit for Mars.

Kepler did not have the correct law of inertia - so the associated force-law had to be wrong to allow the system to 'save the phenomena'.

But the supposition that Kepler did not argue for a parsimonious structural interpretation is a mistake.

Even in the 'Mysterium Cosmographicum' of 1596, Kepler argues for the physical simplicity of the heliocentric model.

In addition, Kepler recognized that no geocentric system (i.e. Prolemy's or Tycho's) could provide a coherent means of ordering the planets' orbits - they are 'angles-only' theories. All observed planetary motions are presumed to be real motions. Distances are irrelevant.

In a heliocentric system, the observation point (Earth) is in motion. Therefore, some of the observed planetary motions are not real - but due to changing perspective. Disentangling these two different components was the principal aim of Kepler's Astronomia Nova - and it is the reason why Kepler had to develop a good theory for the motion of the Earth (middle chapters of AN) before he could do the same for Mars.

Interestingly, in the first chapters of the AN, Kepler works out the geometry of the planetary phenomena in THREE systems - Ptolemaic, Tychonian, and Copernican). Only when Kepler believes that he has convinced the reader that the first two systems are hopelessly a-physical, does he proceed with his refinement of the Copernican system - based on physical causes. (AN Latin title: Astronomia Nova ΑΙΤΙΟΛΟΓΗΤΟΣ seu physica coelestis) Kepler could explain not only the longitudinal motions of Mars but also the inferred Mars-Sun distances with a single theory.

Moreover - the final chapters of AN concern the LATITUDES of Mars' motions. Kepler could explain these consistently with the longitudinal motions by using a planar elliptical orbit model. In any geocentric system, the epicycles or eccentrics of all the planets must librate North and South (with a period of one year) on order to 'save the phenomena'.

Kepler's theory of elliptical orbits, although it used an incorrect force-law, was far more ambitious than anything before it - it was an attempt to 'save' entire classes of data that had never before been theoretically conjoined.

Anonymous said...

Rather than actually addressing Dan's points, we get from Eli:

"Dan, go follow Wm's link as to why Spanos was FOS."

Shades of Roger the Dodger (Roger Pielke, Jr).

and if "statistics in the hands of statisticians should be taken with a load of salt", then statistics in the hands of non-statisticians (like Eli) should probably be taken with an entire salt mine.

The Very Reverend Jebediah Hypotenuse said...

Eli sayeth:
"
Kuhn, Popper and the rest never really came to terms with the two bedrocks of science, parsimony and consistency to understand the world.
"

Kuhn's "The Structure of Scientific Revolutions" is not only a frustrating read, but has to be one of the most absurdly titled books ever.

Kuhn's account itself would have us believing that the transition from one scientific paradigm to the next is fundamentally irrational (incommensurability, etc.) - i.e. logically structureless.

He woulda called it "The Slippery Nebulosity of Scientific Revolutions", but the publisher suggested they should go with something more marketable.

EliRabett said...

Eli assumes that Dan, although not Anon, got to the bottom of the post Wm linked to:
----------------------
Due to the accuracy of Tycho’s observational data and the diligence of Kepler’s mathematical calculations the new tables were of a level of accuracy never seen before in the history of astronomy and fairly quickly became the benchmark for all astronomical work. Perceived to have been calculated on the basis of Kepler’s own elliptical heliocentric astronomy they became the most important artefact in the general acceptance of heliocentricity in the seventeenth century. As already stated above systems of mathematical astronomy were judged on the data that they produced for use by astrologers, cartographers, navigators et al. Using the Rudolphine Tables Gassendi was able to predict and observe a transit of Mercury in 1631, as Jeremiah Horrocks succeeded in predicting and observing a transit of Venus for the first time in human history based on his own calculations of an ephemeris for Venus using Kepler’s tables, it served as a confirming instance of the superiority of both the tables and Kepler’s elliptical astronomy, which was the system that came to be accepted by most working astronomers in Europe around 1660. The principle battle in the war of the astronomical systems had been won by a rather boring set of mathematical tables, Johannes Kepler’s Tabulae Rudolphinae.
--------------------

The damn things worked a lot better. EOS

William M. Connolley said...

Rev: no geocentric system (i.e. Prolemy's or Tycho's) could provide a coherent means of ordering the planets' orbits - they are 'angles-only' theories.

But Tycho's theory is equivalent to Copernicus's, from a planetary-motions viewpoint. Only physics, which they didn't have then, favours Copernicus (well, Kepler).

If you throw in the lack of stellar parallax, observation at the time favoured Tycho.

> Kepler proposed a 'motive force' emanating from the sun... This lead him to the so-called area law

Interesting. Got a cite for that?

Dan said...

Eli, I'm still not understanding your point.

I read to the end of Wm's link. I also read the bit in Spanos's paper about how the Ptolemaic model makes systematically inferior predictions. I don't see any disagreement there. It seems like Eli's lop-eared brain is stuck on the idea that Spanos's paper is some sort of defense of Ptolemy. It isn't--Spanos is very clear (even from the bits I've quoted above) that the Ptolemaic model is unambiguously inferior to the Keplerian. The people claiming otherwise haven't understood (and often haven't read) the paper. Spanos's point is that you can't necessarily tell that one model is better than another just by looking at a single goodness-of-fit measure; Kepler vs. Ptolemy is merely an illustration of this.

This may sound like a trivial point--certainly in my cloistered world of experimental high energy physics, we are sticklers about examining the quality of fit. Unfortunately, I've found that in the wider world of science (and, much worse, pseudo-science like NIPCC, etc.) there are far too many people that think that a good R**2 means a good fit, and far too much abuse like p-hacking.

You can argue that Spanos ignores some valid ways of telling that Kepler's is the superior model--e.g., parsimony, or more rigorously, the kind of Information Criteria ARayIDS mentions above--and that is a reasonable criticism of Spanos. But that's still not an excuse for not checking the residuals, or for misrepresenting what Spanos does say.

What got me to RR in the first place was my indignation at the way the climate change "skeptics" misrepresent (aka "lie about") what others have said and written. That's also why I always RTFP, and why I'm being stubborn about getting you to either clarify your position or admit that what you wrote about Spanos is inaccurate.

Jebediah Hypotenuse said...

William Connolley said...

> > Kepler proposed a 'motive force' emanating from the sun... This lead him to the so-called area law

> Interesting. Got a cite for that?



Johannes Kepler, New Astronomy, trans. William H. Donahue, Cambridge: Cambridge University Press, 1992. [Translation of Johannes Kepler, Astronomia nova..., Heidelberg, 1609]

or

Koyre's "The Astronomical Revolution" Methuen, London 1973


- or you could look here:
http://plus.maths.org/content/origins-proof-ii-keplers-proofs

Kepler uses the 2nd ('area') law from chapter 40 of the Astronomia Nova onwards - He arrives at the determination of the elliptical orbit (1st law) in AN, chapter 58.

The key idea for Kepler was that the speed of the planet in its orbit is inversely proportional to its distance from the Sun (This is actually only exactly true at the aphelion and perihelion - but it's a good enough approximation for orbits of low eccentricity). Kepler thought that this inverse relationship was due to a 'vis motrix' (motive force) caused by the Sun. He tried to determine the shape of Mars' orbit based on the 'area law' and Tycho's data - by 'integrating' around the orbit. (This was pre-calulus, so K did this by laborious computation of the areas of small sectors of the conjectured orbit.)

Newton would later prove that the area law is true for ANY central force-law (not just inverse or inverse-square).




Jeb said...

William Connolley said...
"
But Tycho's theory is equivalent to Copernicus's, from a planetary-motions viewpoint. Only physics, which they didn't have then, favours Copernicus (well, Kepler).
"

They had physics - Aristotle had physics (he wrote an entire treatise on it). Not classical or quantum physics - but physics nonetheless.

Jeb said...

Dan Riley said...
"
Spanos's point is that you can't necessarily tell that one model is better than another just by looking at a single goodness-of-fit measure; Kepler vs. Ptolemy is merely an illustration of this.
"

This is certainly true. But it's nothing new. The ancient Greeks were well aware of the problem of empirical equivalence. Hipparchus versus Ptolemy (eccentrics versus epicycles) for example.

Copernicus viewed himself as a 'restorer' of astronomy - to its Hipparchan use of eccentric circular orbits. Copernicus was well aware that Ptolemy's theory is empirically equivalent - but - quite aside from the geocentric aspect of Ptolemy's theory - Copernicus rejected Ptolemy's use of the equant-point. Why? Because he did not accept that the planets ACTUALLY sped up and slowed down over time. Only uniform circular motions were allowed by Copernicus. Only that would be physically consistent with the existence of solid planetary orbs.

Loosely speaking, Copernicus created a heliocentric version of Hipparchus' theory.

willard said...

> Kuhn's account itself would have us believing that the transition from one scientific paradigm to the next is fundamentally irrational (incommensurability, etc.) - i.e. logically structureless.

I disagree.

Kuhn's incommensurability thesis is simply the idea that one can't objectively compare theories that belong to different paradigms, for the simple reason that such measure would presume a standpoint that does not exist.

It does not imply that the evolution of scientific theories is structureless, only that if to take into account what he called "extrascientific ideas," just like Koyré did, incidentally.

Arguing that this is irrational would be like arguing that Kant's critiques are irrational because they impose constraints that exclude all conceptual schemes except his.

***

These are minor disagreeements. Koyré's **Études galiléennes** rocks, and "structure" and "revolution" are a sign of Kuhn's times.

willard said...

Erratum:

> only that if to take into account


we need to take into account, that is.

I also would like to add that I don't want to defend Kuhn. If bunnies could bash him for the good reasons, though, that would be nice.

***

On an unrelated note, Douglas Keenan has been promoting random walks at the Bishop's, using the same "because AIC" argument as Ray did earlier.

I find this argument underwhelming, to say the least.

***

Also note that Nullius used the same example as

> It's like the Ptolemaic theory of epicycles versus Newton's law of gravitation. With epicycles, you can't validate it because you can't falsify it. If the data deviates from what you expect, you can always add or adjust epicycles to match it. While a specific set of epicycles can be rejected, you can always find a better fit. But Newton's theory is simpler, with far fewer parameters to adjust, and far more limited effects from adjusting them. If you assume an inverse-square central force, you are very tightly limited in what sort of behaviours you can predict. You can test those predictions, and either confirm the reliability of prediction or reject the theory in its entirety.

http://www.bishop-hill.net/blog/2014/7/3/where-there-is-harmony-let-us-create-discord.html

I thus welcome Spanos' argument, at least for irony effect. And I thank bunnies for that discussion, which I read with delight.

Jeb said...

willard writes:
"
It [Kuhn's incommensurability thesis] does not imply that the evolution of scientific theories is structureless, only that if to take into account what he called "extrascientific ideas," just like Koyré did, incidentally.
"

Yes - but...
Extra-scientific is surely a time-dependent category.

E.g. Kepler's own mentor, Michael Maestlin, advised against any introduction of the concept of 'forces' into astronomy, arguing that they did not form any part of what was then astronomical explanation. But Kepler's non-heeding of that advice was 'scientific' by later standards.

One bunny's extrascientific ideas are another bunny's revolutionary carrots.


EliRabett said...

Dan, the point that Eli was trying to make (and others here) is that Spanos' (even if he had not compared apples with carrots) argument is both unnecessary and not very convincing. The two arguments that carried Kepler were parsimony and accuracy.

As to Kuhn, where is Kuhn without observation?

willard said...

Where are Kuhn's observations if you abstract the framework for which you gather them, Eli?

Indeed, "extra-scientific" is a strange euphemism, inserted in a footnote from a 1976 paper.

The most direct way to deal with Kuhn may be to reject his concept of paradigm. If you can call that a concept: there's around 50 different usages of it.

I have no problem in saying that a theory supplants another because it is simpler or better. I just need that we agree that it is the decision of scientists. Theorical choices can only be seen as optimization results in retrospect, when we abstract away what really happened.

I'd rather discuss why Ray's argument would be good for Kepler, but not for random walks.

Dan said...

So Eli's point is that Spanos should have picked a different example? As I've tried to say several times, that paper isn't about Kepler, Kepler is just a useful example (and apples vs. carrots is appropriate for the point he is making). That Kepler won due in part to accuracy is part of Spanos's argument, not a contradiction of it (parsimony, OTOH, is a valid criticism).

As for unnecessary, I really wish it were, but I still see way too many papers with inadequately evaluated fits.

EliRabett said...

Yep, Eli is with Theodore Sturgeon on most everything including crap examples to prove a point.

a_ray_in_dilbert_space said...

Willard,
It's pretty clear that AIC is just one of the many things that Keenan doesn't understand. Akaike's derivation explicitly assumes that the correct models is in the list of models being evaluated. Other derivations have finessed that to a greater degree, but you still have to be dealing with correct models.

A random walk is excluded based on physics. You can see this if you simply take decadal averages.

AIC is essentially a goodness of fit metric that takes into consideration the complexity of the different models under consideration. If you are using it with data that are not representative or completely gonzo models, of course it will yield nonsense.

Anonymous said...

Dan,

If you expect Eli to admit he was mistaken, you are wasting your time.

It's simply not his thing, for whatever reason.

Eli's usual MO is to let others jump in and defend him so that he does not have to explain himself and thereby risk making it perfectly clear that he does not understand what he originally posted on.

Or if he does reply, he does it in abstruse language that can mean almost anything.

It's all very predictable.

Thomas Lee Elifritz said...

I predict anonymous commenters will make more content free comments here in the future.

THE CLIMATE WARS said...

All those in favor of exiling anonymous surfing instructors to Bolivia raise their fins.