« From Letters to the Editors of Mirror of Justice | Main | Connecting the Dots »

Tuesday, November 24, 2009

Casual Empiricism and Data Quality

David Zaring has a short but interesting post over at the Conglomerate about different types of empirical researchers in the law. He describes the political science types, who study Supreme Court data, the quant types who study finance and accounting data in business schools, and the cross-over types who teach at law schools but use their inter-disciplinary PhDs for empirical legal research.

Finally, he describes what I would call the "casual empiricists" (and what I suspect the other three groups would call the "wannabe" or "fake" empiricists). I prefer "casual," of course, because I fall into this group. These law professors are interested in quantitative data but may lack the skills or training for hard core, PhD level quantitative analysis. For example, though rusty, I have plenty of econometric skill. I took the same basic graduate level courses given to economics PhD students and worked as a research assistant to an economist defining, running, and evaluating complex regressions. However, I never took the advanced data gathering and analysis courses, nor did I work on a dissertation. I suspect many of my casual empiricist colleagues are in the same boat.

As a group, we clumsily gather data about legal questions we want to study, and then run basic regressions on the data. Even so, there is something to be said about casual empiricism, and for reasons discussed below, I reject arguments that all casual empiricism is unworthy as legal scholarship. Of course, this is an old debate, even on this blog.

Part of the genius of the great empiricist is taking a large, generalized data set and analyzing it in new and different ways to yield answers to questions we previously thought unanswerable. Even better is the slightly smaller, slightly more tailored data set that gives us insight into bigger societal issues. Petra Moser's work studying World's Fairs data comes to mind.

But this type of study has its limits. It is almost always an indirect look at the question that is being answered. Sometimes indirect information is the best information available, and maybe even the only information that can be gathered at a reasonable cost. However, indirect information is necessarily incomplete in at least two ways.

First, there may simply be data missing. This isn't a problem if the missing data is randomly distributed, such that the results are unbiased. Knowing when you have enough unbiased data to reach a conclusion is a valuable skill.

Second, the data may not be sufficient enough support the inferences being made. Sometimes this is simply due to logical errors, such as assuming that correlation means causation. These types of errors are rare in quality scholarship, though, as most trained empiricists will avoid them or at least look for statistical ways to show that correlation is probably tied to causation. More difficult to assess is the incompleteness that comes from limited data. For example, much empirical patent scholarship involves the study of patents on publicly traded companies because, simply enough, data is available about public companies. The question, though, is whether the results can be applied to private companies. Recent private company survey results imply (to me at least), that the patent system may work very differently for private companies.

It is the second form of incompleteness that drives many casual empiricists. There are burning questions that large data sets just cannot answer. If they could, then chances are that some "real" empiricist would have looked at the question. Thus, the small, tailored data set leaves a niche for empirical work whose added value is the data gathering rather than the clever analysis of generalized information. Of course, trained empiricists do this data gathering as well, but they don't have the same large competitive advantage in this area.

I am working on just such a project, and it has taken an amazing amount of work (primarily by my dedicated research assistants and with grateful appreciation to my dean). I'm studying the ten most litigious non-practicing entities in patent litigation (you may know them as "patent trolls") and the underlying source of the patents that they are litigating. This has involved 1) identifying the NPE's, 2) identifying all the NPE's subsidiaries that may be litigating, 3) identifying all of the lawsuits, 4) identifying all of the patents involved in the lawsuits, 5) gathering data about those patents, including the initial owners, 6) gathering venture capital information about private owners, 7) gathering valuation information about public owners, 8) gathering incorporation data about the owners, 9) gathering generalized information about the owners, and 10) tracing the assignments of the patent from the original owner to the NPE.

There is no one database that has all of information in any of the above categories, let alone the combination of all of them. Thus, I expect this to be a really great data set when it is done, and will hopefully be a nice contribution to knowledge despite my casual status.

Of course, this is not intended to demean empirical work by PhD's, who quite often do this kind of data gathering and then do even more clever analysis of it than I could fathom. Indeed, others have suggested to me (and I agree) that for heavy-duty empirical work it makes sense for the casual empiricist to work with (or at least get input from) a trained empiricist. Nonetheless, there is some room for everyone at the table.

Posted by Michael Risch on November 24, 2009 at 08:06 AM in Life of Law Schools | Permalink


TrackBack URL for this entry:

Listed below are links to weblogs that reference Casual Empiricism and Data Quality:


Sounds like an interesting project. And yes - we may be best at collecting useful data and taking a first cut ... others can follow up, perhaps (but, then, those others don't work with small data sets - and you may not find anything....).

Posted by: David | Nov 24, 2009 10:39:16 AM

Post a comment