Tuesday, April 30, 2013
More On Counting: The Problem of Shady Statistics
The DC Fire Department seems to have come up with an innovative way to reduce the number of arsons and to improve its arson clearance rate (i.e., arsons that result in an arrest). No new technologies, no new investigative techniques, not even any additional investigators. It has simply redefined arson.
The old definition—which is apparently the one used by fire departments around the country—counted any fire that had been deliberately set as an arson, while the new definition requires “evidence of willful or malicious intent sufficient to support an arrest….” The effect of this shift? The number of arsons dropped from 154 in 2008 to 32 in 2012, and the clearance rate was nearly three times what it would have been under the older definition (34% vs. 10%).1
In this case, the nature of the change was so dramatic that it automatically calls attention to itself. DC’s arson rate was just one-third the national average for a city of its size. Maybe this was really the case—maybe the investigators were really good and deterred a lot of arsons, or the populace was uniquely disinclined to start fires—but it at least openly demands scrutiny.
Perhaps the most egregious example of such redefinitons took place in Chicago in 2010, when the police commissioner attempted to counter rising murder statistics by breaking murder into two categories, “indoor murders” and “outdoor murders,” arguing that the police could only be held accountable for the latter. How, he argued, could the police really prevent murders that take place away from the police?2 In one fell swoop, he cut the number of murders that his department was “responsible” for that year from 138 to 98.
Other problematic statistics, however, are harder to detect.Consider these examples from David Simon (of Homicide, The Corner, and The Wire fame). In 1988, the Baltimore police cleared nearly 70% of their murder cases, and the DA secured murder convictions in just over 80% of its cases. Yet the probably of going to prison for murder was under 40%. How is this possible?
For the police, a case was cleared as soon as they made an arrest, even if the charges didn’t stick. This isn’t unique to Bawlmer. This is how the Uniform Crime Reports defines a clearance. An arrest is an arrest, even if it is a shoddy one that ultimately goes nowhere.3
So the police counted a clearance when they made an arrest, even if the DA as forced to drop the charges before he could even get an indictment. But the DA calculated his own clearance rate as the percent of cases for which there was an indictment that resulted in conviction. Arrests that were dropped before indictment? No effect on police stats, no effect on DA stats. These cases simply disappeared between the two definitions. And there were enough of these cases to drop the risk of conviction to 40%.
Starting in 2011, a policy shift gave the DA sole power to charge murders, instead of the police. So the DA charges only the cases that he is sure to win. As a result, his clearance rate has remained high (if he charges, he will prevail), prosecutions are way down—from 130 in 2010 to 70 in 2011—and the police clearance rate has dropped into the twenties (since the DA declines to charge any marginal case, even if the police have a viable suspect).
As Simon points out, this is a dangerous precedent: due to the use of bad stats, the police look bad and the DA looks great, but the source of the problem is the DA, not the police.4 Yet it is hard for even a sophisticated observer to catch what is going on here, much less a lay consumer.
And things can get worse. Sometimes the very way data are (sincerely) gathered and processed can make them unreliable in ways that are hard to detect. Numerous researchers rely on county-level crime data from the UCR. But as Maltz and Joseph Targonski point out, problems with agencies not reporting consistent crime data makes county-level data unreliable, but the problems are not immediately obvious. Maltz and Targonski make this argument using John Lott’s and David Mustard’s book, More Guns, Less Crime, as their case-study, and one of their broader complaints is that most of MGLC’s critics attacked the methodology while taking the data “as given.” But, they argue, we often need to look closely at the data, since subtle problems lurk everywhere.
So what does all this mean? There are a few key lessons to take away:
- If nothing else, this is a strong warning against casually running empirical models, a growing problem in legal scholarship. Legal academics shouldn’t just get their IT departments to install Stata on their computers, download some data, and then start running some regressions. It can take years to fully understand what a dataset looks like, what it is really measuring, its strengths and weaknesses. People who just run some quick regressions and then send them off to a law review are likely moving knowledge backwards, not forwards, since the risk of bad results is too great.
- Be wary of big swings, like a city’s arson rate dropping rapidly in just a few years. Don’t put too much weight on such results too quickly, and try to see if there is any evidence that the definition has changed rather than the underlying behavior.
- The more that is at stake, the more we should probe what the definitions are really measuring. The pre-2011 problem in Baltimore was hiding in plain sight: I imagine that the annual reports from the police and DA both defined how they measured clearance rates, and once you have the definitions the problem is clear. Murder is a big deal, so making sure we understand the definitions clearly is important.
- Perhaps most challenging, these stories suggest that whenever possible, we should put much more weight on what many studies collectively say rather than the findings from any one study. But—critically—this works only if the various studies all use different sources of data, gathered and defined by different people. It is possible that all datasets will use the same problematic definition: local police all use the UCR clearance measure, and (outside DC) all fire departments measure arsons based on the definition in the National Fire Protection Association handbook. In these cases, multisite results won’t highlight problematic definitions, since all the sites are using the same definitions. But when we don’t have this sort of centralization, then multiple studies can help us begin to highlight games with statistics (like noting that DC’s arson rate is 1/3 that of cities of comparable size).
1There were apparently other shenanigans taking place as well. The DCFD initially reported a clearance rate of 72.7%, but this was based on only a partial number of months, with the revised full-year clearance rate dropping by more than half to 34%.
2Lost in all the ado about his flagrant efforts to define away a serious murder problem was the fact that the commissioner’s claim reflected an implicit rejection of much of deterrence theory. Any sort of delayed sanction—say, punishment after a lengthy investigation by the homicide bureau—was seen as having no deterrent effect (which must have offended a number of homicide detectives). The commissioner was in effect arguing that only the immediate presence of a police office could deter.
3Think about how the FBI’s definition of a clearance can influence how police go about their jobs. Police are incentivized to simply make an arrest, any arrest; their official stats do not reward the quality of the arrest. The very act of creating a definition of what something is can change how people do that something.4There could be a long aside here about what this scenarios tells us about the risk of relying on highly-accountable, elected DAs, but this post is long enough as it is.
Posted by John Pfaff on April 30, 2013 at 09:17 AM | Permalink
TrackBack URL for this entry:
Listed below are links to weblogs that reference More On Counting: The Problem of Shady Statistics:
Doesn't key lesson #1 support a move away from student-edited journals toward peer-reviewed journals, or at least some hybrid peer-review/student-editing? The idea that law profs will just voluntarily refrain from submitting articles that student law journals are eager to publish seems unrealistic. Does the peer-review model used in other disciplines offer a better mechanism for screening out bad empirical work?
Posted by: Jonathan Witmer-Rich | Apr 30, 2013 1:32:47 PM
@JWR: Absolutely. I plan to return to this point down the road, when I talk more broadly about how to kill off zombie ideas. I think the law review system in general--student editors, simultaneous submissions, expedited review, etc.--needs to be replaced by peer review. Peer review is far from perfect, but it is worlds better than the law review system. Law is the only academic profession in which students determine what faculty scholarship is acceptable. It is a completely upside-down system.
So yes, peer review does a much better job of screening. In fact, Maltz and Targonski note that a peer reviewer actually identified one of the problems with the UCR data in Lott and Mustard's work. Now, unfortunately, that reviewer's concerns were never properly addressed--like I said, peer review is not a iron-clad guarantee of quality--but there is no way a law student could have caught that, and in some other situation that referee's comment could have made a big difference.
I actually wonder if enforcement could come in the other direction: we can't keep law professors from submitting empirical work to journals unqualified to review it, but we can choose to never cite those articles in our own work. To the extent that academics care about future citations, the freeze-out could force them to publish in peer reviewed journals. And for those academics who don't care about citation counts, a freeze out would at least contain the problem: if never cited, the potentially-bad conclusions can't spread.
Posted by: John Pfaff | Apr 30, 2013 1:45:23 PM