« A Casual Casebook: The Canon of American Common Law | Main | Iowa Gay Marriage -- the Latest »

Thursday, April 23, 2009

What is Quality Empirical Work, Part 2

In my previous post, I examined the difficulty of defining what quality means for the type of (observational) work that makes up much of empirical legal scholarship. Here I want to turn my attention to the problem of measuring quality even if there is an accepted definition.


This will be my last methodological post. Next week I'll turn my attention to the more legal side of the problem. That acadmics should use more rigorous methods for producing and interpreting empirical results should not be controversial. But the issue is more challenging in the legal setting, since court cases are not just about the truth. So next week I'll look at how to balance the empirical superiority of systematic reviews with the other normative goals of an adversarial legal system, and consider how (if at all) to incorporate such reviews in legal disputes.


But for now, I want to continue looking at how to define and measure quality. Again, the key challenge we face is the greater methodological complexity of observational studies. Randomization eliminates all sorts of potential confounders, so in the absence of randomization observational papers have to develop clever ways to mitigate their effect.

I want to make two points here: 

1. First, that measuring the quality of observational studies is difficult, even if there is no disagreement over what "high quality" means. 

2. Second--and perhaps more important--that this difficulty does not reflect a weakness of systematic reviews but a strength. If we cannot agree on how to measure certain quality components, that is important information for us to have--information we must have to appreciate the epistemic limits of what our models can tell us. 


The basic problem we face is the following. Most empiricists can agree on the types of confounders that pose problems, but they will often disagree about (1) whether it is present in a particular problem and/or (2) what the proper solution is to use when it is present.

Here's an example of the latter issue. (Those not interested in a technical example can jump to the next paragraph without losing anything.) All empiricists would agree that self-selection bias is often a problem. But what is the best solution? Should we require that studies use some sort of matching program like MatchIt or create a synthetic control, and in the absence of such admit that the question cannot be answered? Or is propensity score matching, despite it flaws, good enough? Or can we set the bar even lower and just require a lot of potential selection covariates?

In other words, every problem has a host of solutions, some of which may be more appropriate in some settings than others, and each with its own costs and benefits. At one level, guidelines could simply say: "Is self-selection a problem? If so, does the study address it well?" At the very least, such a guideline introduces transparency by forcing the reviewer to explicitly state her views on the paper. But it nonetheless leaves a lot to the reviewer's discretion and judgment, the very thing systematic reviews were designed to limit. 

On the other hand, guidelines that prescribe specific solutions will be harder to make, less widely applicable, and possibly too "narrow-minded" (excluding "sufficiently" good alternative approaches). But judgment is better restricted--at least at the level of applying the guidelines, though such terms demand more judgment at the time of construction.

And this is just one of the many challenges of measuring quality. I just want to briefly note some of the others:

1. How to score a particular item. Do we adopt a binary approach ("good/bad") or something more continuous (1-5). But if something more continuous, how do we decide how many points to use, and what distinguishs a 2 from a 4? Binary approaches are blunter but more objective.

2. How to score a study. Some have advocated using a single aggregate quality score that can be used to weigh studies and results by quality. But others have pointed out several flaws with such an approach, arguing instead for a component-by-component score. But if the number of components is close to the number of studies, this could prove intractable. 

3. How to empirically verify the guideline terms. As I've discussed before, this requires meta-evidence about whether particular quality terms in fact matter. In observational settings, the number of relevant quality criteria may exceed the number of studies, rendering it impossible to isolate individual effects of quality terms on the results.

4. What studies to include. Study quality will vary. Should all the studies be included but some how "weighted" to reflect quality, or should studies only above a certain line be used? And what would that line look like when using a multidimensional quality score? Like all decision rules, this is ultimately more a normative than empirical question, and one that is tied to whatever definition of quality we are using.

Thus my first point: these are not easy questions. But it is essential that we address them, though almost no work whatsoever has taken place along these lines in the social sciences.

And this segues into my second point. It is possible that empiricsts will find compelling solutions to all these problems. And if so, fantastic. But what if they cannot? Does this imply that sysmteatic reviews are impractical?

No--emphatically no. Failures to answer these questions are answers: they point us to blind spots that we must acknowledge. If knowledge comes from the synthesis of a literature, as I believe it does, then failures to agree on how to synthesize reflect "known unknowns." To turn our back on these questions because we cannot reach answers is to engage in willful blindness.

If we reach answers to these questions, our ability to produce meaningful knowledge will grow. And if we fail to reach answers, then we will have a better understanding of the limits of our knowledge, which is important knowledge in and of itself.

Posted by John Pfaff on April 23, 2009 at 07:04 PM | Permalink

TrackBack

TrackBack URL for this entry:
https://www.typepad.com/services/trackback/6a00d8341c6a7953ef01156f5089a3970c

Listed below are links to weblogs that reference What is Quality Empirical Work, Part 2:

Comments

The comments to this entry are closed.