Tuesday, April 07, 2009
The Technical Revolution in Empirical Analysis
I want to start my discussion of the dangers empirical work in law (and the social sciences more generally) faces by looking at how we got to where we are today, and what it means for the future.
1. Processing power. The Apple IIe I had in grade school was a warhorse of a machine, with a 1 MHz processor. My laptop today is good, but not at the highest end of the personal computing scale, and it has a 1.86 GHz processors, a 2000-fold increase in power. Moore's Law predicts that processing power doubles every two years, and it still seems to hold. We see the same improvement in RAM: my IIe was a beast with 256 kB of RAM, and my laptop is only respectable with 2 GB of ram, a 9000-fold increase.
2. Memory. Mainframes required punchcards, and a given punchcard could hold 80 bytes of data. To put that in comparison: the 5-1/4" floppy disks that everyone used in the 1980s and early 1990s could hold about 1.2 MB, a 15,000-fold improvement. Today I have in my pocket a 2 GB flash-drive smaller than my thumb. Given that each punchcard was about 0.007 inches thick, it would take a pile of punchcards almost three miles high to hold that much information. Or, put differently, my flash drive hold 25 million times more data than a punchcard.
Not surprisingly, we have more and more data at our (ready) disposal. With the rise of the massive dataset--observations running to the millions if not tens of millions of observations--comes the ability to make increasingly sharp and significant discoveries.
3. Ease of access. Mainframes were shared computers: everyone had to compete with everyone else for limited time. The desktop democratized the process, allowing everyone to have continuous access to a computer, and at affordable levels.
4. Software improvements. In the time of the mainframe, analysts generally had to write their own code. This introduces a huge barrier to entry, not just in terms of time but in terms of training. Even the statistics packages of ten or fifteen years ago were clunky (just ask someone about SAS or Gauss). Programs like Stata and SPSS make it easy for anyone to run regressions.
1. The Actuarial Turn. Starting in the 1960s, there has been a strong move away from relying on clinical judgment and towards using actuarial models. Only with the development of large-scale empirical models did we have the information necessary to appreciate the limitations in human judgment and to test and validate more rigorous actuarial alternatives. This attitudinal shift is central to an argument I'll make later, that ELS and the empirical social sciences need to adopt a more evidence-based approach; the evidence-based movement in many ways is another manifestation of the general shift towards actuarialism. (Given my embrace of actuarialism here, I should take a moment to remind Bernard Harcourt, who was on my dissertation committee when I was a graduate student, that he can't unsign my forms now!)
2. The Risk Society. Sociologists and other social scientists have noted that over the past three decades our views on what the government ought to do have changed; a key shift has been our growing desire that the government protect us from risk. David Garland and Jonathan Simon have written about this in the crime context, but as Cass Sunstein and others point out it has been a more widespread shift, including areas such as environmental regulation, health care, and so on. Part of the reason could be a loss of faith in expertise and the government's ability to provide for us, as Garland suggest, and part of it could me economic maturity (once we're wealthy enough we favor protection over growth).
But while I do not yet have any evidence along these lines and have only just started to think about this point, it seems to me that the technological revolution must play an important role here as well. We simply know so much more about what can hurt us. It is impossible to detect the effect of minor toxins, even if the effect size is substantial, without large datasets and powerful computers. With more powerful tools comes more awareness of the risks we face, and thus the demands we have for protection. For example, I would not be surprised to find evidence that the rise of toxic tort cases--and thus the rise of the use of complex scientific evidence in court--closely tracks the rise of the computer.
3. The Use and Abuse of Empirical Evidence. Thanks to powerful computers and easy-to-use software, good empirical work has never been better. But thanks to powerful computers and easy-to-use software, bad empirical work has never been worse. And the bad work comes from two directions. In some cases the problem is naivete. Researchers with little formal training, but with a decent desktop and a site-license for Stata, start to put together empirical projects. They are (seemingly) easy to do, and "but do you have data to support your claim?" is a question asked with growing frequency. Oftentimes these papers suffer from fundamental empirical flaws. This is a particular problem in ELS, since there is no peer review check in many cases.
Perhaps more troubling is the cynical manipulation of empirical work. Advocates and advocacy groups frequently appear to manipulate results or to adopt methods that ensure (as best as possible) that particular results arise. Such sins are committed on both side of many debates; no-one has a monopoly on such behavior. And with the rise of a more data-driven society (a function of both the actuarial turn and the evolution of the risk society), the returns on having at least the veneer of empirical support are rising, and the technological revolution is driving down the costs of producing such results; together, these trends only encouraging more and more statistical abuse of this sort.
On top of all this, even if we focus just on the sincerely-developed, high-quality work, problems are starting to arise. There is simply more and more of it, and we lack the tools to extract the big picture from them. This is a point I will return to repeatedly.
Posted by John Pfaff on April 7, 2009 at 09:38 AM | Permalink
TrackBack URL for this entry:
Listed below are links to weblogs that reference The Technical Revolution in Empirical Analysis:
John, on behalf of non-empiricists with some smattering of statistical training everywhere, can I ask you to be more specific when you say you see a lot of "bad" empirical scholarship? I don't want you to name names, but in your view, what distinguishes good from bad empirical work? Is it awareness of the limits of some kinds of data? Use or failure to use meaningful controls?
Posted by: BDG | Apr 7, 2009 5:15:04 PM
BDG: Both your examples work. Failing to include a key variable or failing to model a structural problem with the model (like endogeneity) are definitely signs of "bad" work. So too is failing to look closely at the data to appreciate, and control for, their defects. And so too is failing to think carefully about what your model is actually saying.
(Here's an example of the last concern. Some people have looked at the effect of crime in year t on the prison population in year t. But that model probably tells us very little, since the prison population in year t is a function of crime in t, t-1, t-2, t-3, ..., and a function of all those years in some sort of complex way that the simple model cannot catch.)
After that it gets tricky. What constitutes "high quality" work is something that the social sciences really never ask, but it is a question that the evidence based policy movement is forcing empiricists to consider. I'll have a future post about quality that will tackle this in more depth.
Posted by: John Pfaff | Apr 7, 2009 5:55:15 PM
John, a good starting point for a critique of "empirical work in law" is the ELS assumption that empirical research is synonymous with quantitative research. This reflects the overwhelming quant bias in political science. But even poli sci quants know that empirical research includes qualitative work. And folks in other relevant correlate fields - like sociology - understand that qual research is essential to any thick account of what the world is like. It seems to me that empirical work is incredibly valuable, but good empirical work is rigorous, methodologically diverse, and self-conscious of its own limits.
Posted by: Dan Filler | Apr 7, 2009 10:23:39 PM
How about this first step: mainstream law reviews should stop publishing empirical works, and the faculty should get in the habit of "discounting" the quality of empirical works published in mainstream law reviews to move things along in this direction.
Posted by: Law Review | Apr 8, 2009 3:11:40 AM