Thursday, April 09, 2009
A Misguided Philosophy of Science
In my last post, I discussed the first key problem empirical work is facing: an explosion in the number of empirical studies, with the likely effect that average quality is declining. In this post I want to turn to the second major problem that I think ELS is facing, namely an incorrect philosophy of science.
Posted by John Pfaff on April 9, 2009 at 10:40 AM | Permalink
TrackBack URL for this entry:
Listed below are links to weblogs that reference A Misguided Philosophy of Science:
» Lawyers, Bankers, and Explananda from Legal Profession Blog
Posted by Jeff Lipshaw In this morning's Wall Street Journal, James Freeman, an assistant op-ed page editor, reviews a book about James Dimon (The House of Dimon) written by one Patricia Crisafulli. I came for the gratuitous dig at lawyers,... [Read More]
Tracked on Apr 10, 2009 10:35:35 AM
We not only test the same hypothesis over and over--that of no effect--but the hypothesis we test is the least daring one possible. Superficially, it looks like we're doing what Popper asked: we're trying to reject our hypothesis. But unlike the physicists, we actually want to reject it.
I'm not sure about the middle sentence, but you have to add into this the problem of the hypothesis itself, which is the result not of induction nor deduction, but of what Peirce called abduction, or inference to the best explanation. As Steve McJohn, my colleague, who with Lorie Graham, just published a little essay that touches on this, said to me in the hallway just a couple days ago, it's abductive reasoning that's still the black hole. So if your paper returns what appears to be the black swan, we have to go back to the hypothesis you are testing and assess it to see whether the black swan has any explanatory power (or in Pinker's terms, OOMPH). Finding a black swan that requires a re-thinking of the downward sloping demand curve strikes me as a completely different animal (bird?) from finding one that refutes Ron Gilson's theory of value creation by lawyers. That is, the former has earned our respect in the way that the latter has not.
John, if you are looking for materials dealing with the meta-issues in the legal literature, I'll recommend my own Models and Games: The Difference between Explanation and Understanding for Lawyers and Ethicists, 56 Clev. St. L. Rev. 613, 636-49 (2008). I've also touched on this in Law's Illusion: Scientific Jurisprudence and the Struggle with Judgment, Beetles, Frogs, and Lawyers: The Scientific Demarcation Problem in the Gilson Theory of Value Creation and Disclosure and Judgment.
Posted by: Jeff Lipshaw | Apr 9, 2009 12:10:53 PM
A few things about the post escape me. First, the comparison to physics seems a bit artificial, as social science cannot be reduced the same way. And that's true of natural sciences, too. When you write:
In criminology, we may be able to make a guess about
the direction of the effect, but that is all: "more people
in prison will lead to less crime" is the best we can
do. There is no theoretical reason to say "a 4% increase
in the prison population will lead to a 7% decrease in crime."
That sounds to me a lot like the process used in FDA approval of drugs, a lot more significant than what we do and without great objection to my knowledge. You have a very good point that we need to use null hypotheses other than "no effect" but that's hardly a huge indictment of ELS as compared to countless other fields.
As for "no synthesis," that is certainly untrue in my primary area of study -- judicial decisionmaking. And it's not true of my secondary area on institutional analysis of legal systems. I'm not an expert on crime research, but I do see a lot of back and forth debate.
As for "wrong tools," this seems a little obscure; are you talking about substantive significance rather than statistical significance? If so, this is an increasingly recognized problem but throughout nearly all fields, not just ELS. And it is increasingly being addressed, in part through the trend to graphic depictions of relationships
Posted by: frankcross | Apr 9, 2009 12:28:42 PM
Frank: Thanks for you comments. My point isn't that we need a different null hypothesis but that we not really proposing and refuting hypotheses at all, whether they are "no effect" or "the elasticity is one." We are trying to measure a the effect directly, and this requires a different set of tools. The t-stat and the p-value are not very helpful, and putting a star next to a result does not tell us much.
And I am certainly not limiting my critique to ELS. I think a lot of fields suffer from the problems I lay out here, even if they don't acknowledge it. Thus my comparison to physics: there are certain fields for which Popperian falsification may make sense, and physics may be one of them. But the social sciences, and perhaps the biological sciences more generally, are not those fields.
(Perhaps in the FDA case, what is taking place is an effort to test the broader null hypothesis of "there is not a particularly large effect." In this case, falsification may make more sense. The null hypothesis is not a straw man, and for regulatory purposes the precise estimate is not so important.)
I'm glad to hear that there is synthesis taking place in other fields. But what is it like? In criminology, for example, there is plenty of back and forth, and plenty of what I will call "informal" or authoritative reviews. But outside of some systematic reviews of experiment work gathered by the Campbell Collaboration, I have seen no effort to develop rigorous evidence-based syntheses in the criminal justice literature, certainly not when it comes to observational work, where such guidelines are most needed.
And perhaps I wasn't so clear about the last point. I'm not really talking about the statistical vs. substantive point. Instead, I'm concerned that the way we calculate our confidence intervals, whether to report them numerically or graphically, is poisoned by the null hypothesis assumption (we use a central, rather than a non-central, distribution). This is an issue I've only started to wrestle with, so I'm not sure in the end how important it will be. But it struck me as one of those ways that Popper's influence slinks into our techniques without us even realizing it.
Posted by: John Pfaff | Apr 9, 2009 12:44:24 PM
Well, I'm a little loathe to get into the philosophy of science, on which I am not well read. But I think you are setting an unduly high standard here. You write re the FDA:
for regulatory purposes the precise estimate is not so important
I think that's true for our research too. I don't think you can get a precise estimate in the noisy world of social science and I don't think it's essential to get one. This can't be physics. Just providing a little more information to nudge our knowledge.
The debates in law involve a question like: "does wrongful discharge law increase unemployment." There's a "no effect" null hypothesis, but its useful to know if it can be rejected. If it can, we won't know precisely how much unemployment is caused but we have some information (that we previously lacked) that needs to be considered in the adoption of wrongful discharge laws. Then, it would be good to bracket the magnitude of that effect and see if it was replicable in further research. That's not everything, but it's a fairly valuable something.
Posted by: frankcross | Apr 9, 2009 1:24:07 PM
Frank: I actually think we're more or less on the same page. In criminal law, for example, we don't just want to know that more prisoners lead to less crime, but we want some sort of estimate of the magnitude. It is impossible to know whether a particular policy makes sense unless we can measure the costs and benefits, and that requires us to know the size of the effect. It is impossible, of course, to measure it exactly, but we need to have a sense of the bounds on it. Just like you said about unemployment.
Perhaps the difference between our positions is that your "then it would be good..." is to me what the real goal is.
But my point is that that is not hypothesis testing. Setting bounds on what we believe to be the true effect size is inductive confirmation, not deductive falsification. And that requires us to use different tools and to be much more careful about how we synthesize large bodies of empirical findings--and much more reliant on well-designed, rigorous syntheses.
In other words, no one study tells us very much, and the actuarial turn we've seen over the past thirty years suggests that, left unguided, most people, even experts, do a surprisingly poor job adding up all the little nudges to develop the big picture.
Evidence based medicine and evidence based policy, which are built on this very concern, are revolutionizing fields such as, well, medicine (clearly) and epidemiology. And so far, I just haven't seen that level of rigor in the social sciences.
But I should be careful not to set too high a goal. You're right that precision like CoBE is definitely impossible. But even deriving bounds is ultimately inductive.
Posted by: John Pfaff | Apr 9, 2009 1:38:43 PM
When I was a fellow at Stanford a couple years ago, Jeff Strnad presented a paper on Bayesian empirical analysis (here: http://papers.ssrn.com/sol3/papers.cfm?abstract_id=991335 ) that I thought was simply fascinating. The idea is that you combine a variety of models and data in a way that allows you to see what role different variables are playing in different models. By doing this, you can see what happens as you insert and remove variables into the model. The idea is that you get a more "objective" look at the data and its results.
Perhaps this is what you have in mind as the tools, although even this will likely not get specific numbers for the values. If I recall correctly, Strnad's point was in part if you are going to "follow the results" rather than test falsifiable theories, then Bayesian analysis was the better way to do that.
Separately, I agree with most of what you write - we were always taught in my econometrics classes to come up with a theory first, and then test it, rather than plugging in variables until you get something that is significant.
However, I do disagree with this statement (which you actually seem to back off from later): "Physics produces genuinely testable predictions. The social sciences do not."
That's just not true. "If prices go up, fewer people will buy x." That's a falsifiable prediction, at least with respect to discrete markets. "If prices go up by 10%, 20% fewer people will buy x" Even this is falsifiable, so long as you have data where prices went up by 10% in the market.
The problem comes, I think, when the predictions get more complex and the external forces that might affect the outcome are greater. If we had a theory for those forces, though, we could test it.
Posted by: Michael Risch | Apr 10, 2009 11:46:30 AM
Michael: I'm a big fan of Jeff's paper. In fact, I've been working on a BMA paper with Jeff Fagan and Ethan Cohen-Cole for a little bit now.
Given your comment and some of Frank's, I think I may have been a bit unclear in my argument. I'm not trying to get more precise results. If anything, I want to focus more on confidence intervals than point estimates for this very reason. One of the things I find appealing about BMA is that it *expands* the confidence intervals, giving us a better sense of the limitations of what we can show.
As for your refutation point, you're right that we can propose falsifiable hypotheses, but I don't think that is what we are really doing. The problem is that, unlike in physics, in the social sciences our theories can't give us predictions like "a 10% increase in price will lead to a 20% decline in demand." If they could, then Popper may be more applicable.
Unfortunately, social science theories remain generic. Sometimes theory may suggest a point estimate--perhaps there is a theoretical reason to think an elasticity is one--but this is likely rare. At best our theories can suggest relationships (x will be larger than y, x will die off over time).
In fact, so generic are our theories that they are rarely wrong--demand falls with prices, crime falls with prison populations (at least in the short run), etc., etc. Peter Schmidt has some interesting articles along these lines suggesting that the focus on false positives in hypothesis testing is misplaced since usually the probability of a false positive is zero: we *know* the no-effect null is wrong. What we're doing isn't falsification.
So, since we can't generate the "10%-20%" theory, we try to find what the relationship is in the data: we try to estimate an effect size. Almost every paper makes the policy jump: "My coefficient of x means that a 1% change in z leads to a y% change in w." This is what we really care about--what is the elasticity of demand, not the generic sign of the elasticity--and this is an inductive question.
Posted by: John Pfaff | Apr 10, 2009 1:23:41 PM
You might remember me from Judge Williams' chambers. Great post. Have you read Ziliak and McCloskey's "The Cult of Statistical Significance"? Perhaps a bit hyperbolic now and then, but still a good read.
Posted by: Stuart Buck | Apr 10, 2009 2:45:22 PM