« Founding Federal Decentralization | Main | Hiring Announcement: Loyola-Chicago »

Wednesday, September 06, 2017

Corpus Linguistics and Criminal Law

In January of 2017, the Federalist Society hosted a panel on statutory interpretation at its annual faculty conference.  The panel promoted a new method for statutory interpretation: corpus linguistics.  Among the panelists was Thomas Lee, a former law professor at BYU who now sits on the Utah Supreme Court.  Justice Lee has used corpus linguistics in more than one opinion, and the BYU Law School has been promoting corpus linguistics through conferences.

It is easy to see why corpus linguistics is appealing.  It offers a new twist on textualism.  It promises to make the initial “plain” or “ordinary” meaning question of textualism a data driven inquiry.  At present, textualist judges rely on their own linguistic intuitions about the plain/ordinary meaning of a statutory term.  And if a judge finds that a statutory term’s meaning is plain, then she will not look at other non-textual sources, such as legislative history or certain canons of statutory construction.  The problem is, judges often disagree over what the plain or ordinary meaning of a term is.  As a result, textualism sometimes looks unpredictable or subjective.

Corpus linguistics tells judges to answer the plain/ordinary meaning question with a linguistics database search.  The corpus linguistics databases allow judges and lawyers to search for words to see how often they are used certain ways. And if the database says a term is more often used as X than Y, then corpus linguistics tells us that is the “ordinary meaning.” In other words, corpus linguistics promises us predictable and objective answers to textualism’s most important question.

I was fortunate enough to be invited to the 2017 corpus linguistics conference at BYU.  I wasn’t a natural person to invite to the conference—I’m not an expert in statutory interpretation, and my undergraduate degree in linguistics did not prepare me for the sorts of analyses that corpus linguistics requires.  Nonetheless, I was intrigued by the Fed Soc panel, and so I was eager to learn more at the BYU conference.  But as I read the papers for the conference, and as I prepared my remarks as a commentor, I found myself more and more concerned about corpus linguistics as a methodology.  In particular, I found myself concerned about it being used to interpret criminal laws.  Corpus linguistics raised some of the problems that I had confronted in my past research on the void-for-vagueness doctrine, and it touched on many of the issues that I was grappling with in a new project about criminal common law.  After quite a bit of writing and reflection, I have come to the conclusion that corpus linguistics is not an appropriate tool for the interpretation of criminal statutes.

I lay my concerns out more fully in this short essay.

As my essay explains, in relying on frequency data, corpus linguistics undermines notice and accountability.  Unless legislators and ordinary citizens were to conduct their own frequency analysis—something that appears far too complex for a non-lawyer, if not a non-linguist, to do—then the public will not know how courts will interpret statutory terms.  And if people do not have advanced notice of the scope of criminal laws, then we may not have fulfilled the promise of due process.  Legislative accountability is also undermined by corpus linguistics.  Legislators could pass laws that will be interpreted differently than their constituents might understand them, and so constituents can’t hold their representatives responsible for their policy choices.

Corpus linguistics also doesn’t solve the problems it sets out to.  There does not appear to be a single, correct way to conduct a database search and analysis.  So corpus linguistics will engender litigation over methodology and dueling expert credentials.  This not only suggests that corpus linguistics cannot fulfill its promise of greater predictability and objectivity.  It also raises questions of judicial accountability.  Judges will be able to skirt responsibility for their interpretations of what is legal or illegal by reframing the question as a dispute over database searches rather than a decision about punishment.

Finally, I worry that corpus linguistics seems so attractive because modern legal thought has rejected the idea that statutory interpretation is anything other than a ministerial task.  Before the rise of textualism no one doubted that judges had a substantive role to play in statutory interpretation—especially the interpretation of criminal laws.  Indeed, the standard separation of powers story that is told about criminal prosecutions is that we have divided the punishment power between three branches in order to protect and maximize individual liberty.  A person will be punished only if the legislature decides to outlaw certain behavior, the executive decides to indict and prosecute a particular individual, and the judiciary agrees that the individual’s conduct falls within the clear legislative language.  When the Constitution was written, judges routinely acted as a normative gatekeeper for punishment, construing statutes narrowly to promote common law values even when the legislature seemed to prefer a broader interpretation.

Don’t get me wrong, I’m not saying that we ought to abandon textualism.  In the essay I offer some thoughts on how to improve statutory interpretation.   But I *am* saying that an interpretive methodology that assumes a judge’s professional judgment is an evil to be avoided in statutory interpretation has no place in the criminal law. Language can never be crystal clear. And I would prefer that the people entrusted with deciding the scope of that language saw their task for the important check on government power that it is, rather than as bean counting. 

If you have any thoughts on the essay, I welcome them, including off line.  My email is [email protected].

Posted by Carissa Byrne Hessick on September 6, 2017 at 09:45 AM in Carissa Byrne Hessick, Criminal Law | Permalink


Great article!!! Thank you very much for sharing this detailed post..It was very interesting and helpful.

Posted by: James Parker | Dec 19, 2017 1:06:41 AM

These are all really great comments. I've pulled together some (relatively lengthy) responses in this new post: http://prawfsblawg.blogs.com/prawfsblawg/2017/09/more-on-corpus-linguistics-and-the-criminal-law.html

Posted by: CBHessick | Sep 11, 2017 1:37:02 PM

I can understand the temptation to use data driven analytics but they merely offer platitudes of neutrality not necessarily neutrality itself. Data is nothing more than a collection of observations. Bias cuts both ways - in what is presented and what you see. Frequency analytics adds another layer of bias - selection. On the face of it, frequency sounds neutral . Pick every word with a w or every seventh sound for example - but what if you know that using a method will, in all likelihood, lead to an outcome or a set of outcomes you desire? Even if this is not the case, you're constrained by meanings selected by your antecedents who have their own histories & prejudices. Also, as mentioned above, I doubt judges , not to mention every day people who may become defendants, will parse statutes this way. I won't even get into the many schools of corpus linguistics methodologies. This idea seems to want to take the messy, prejudices of humanity out of judging and the law & I don't think that will happen. I think we'll just wind up camouflaging these prejudices with a garb of studied, scientific neutrality.

Posted by: ZB | Sep 8, 2017 7:39:50 PM

I lied. One last thing. You say "Unless legislators and ordinary citizens were to conduct their own frequency analysis—something that appears far too complex for a non-lawyer, if not a non-linguist, to do—then the public will not know how courts will interpret statutory terms."

To me this like saying economists can't understand consumer behavior because consumers don't use regression analysis.

Corpus linguists use language data to study the way that language is used and understood in a given speech community. If the question is whether the language of a statute gave a criminal defendant sufficient notice that his/her conduct was subject to penalty, corpus linguistics can be used to try to determine how similar language is used and understood in the community that is subject to that statute. I think very often the evidence will show more than one reasonable, competing meaning of the operative terms were possible, in which case the court would have an objective basis for applying lenity.

The alternative, I think, is often opacity on the part of judges, or the dictionary-based errors in reasoning that you see in cases like Muscarello, which are not good justifications for putting people in jail.

Posted by: Stephen Mouritsen | Sep 8, 2017 1:49:08 PM

Professor Hessick,

I really appreciate your thoughtful comments here and I very much enjoyed reading your essay. I wanted to respond to a few of your comments. (I should note that my responses are my own, and don’t necessarily represent the views of my esteemed, occasional co-author.)

You note that “[t]here does not appear to be a single, correct way to conduct a database search and analysis.” I think that is right. While there is certainly a “family resemblance” in most corpus-based research, I think the differences in approach you mention are particularly apparent with respect to applications of corpus linguistics to questions of legal interpretation. This is, I think, because the application of corpus data to such questions is simply a very new phenomenon.

My view is that corpus linguistics is a tool of scientific observation that, like other such tools, gives you access to information about the natural world (in this case, language use) that you can’t access through ordinary means of human perception (in this case, introspection). It doesn’t tell you what to do with that information or whether that information is helpful for resolving certain types of questions. Shared standards, practices, and methods emerge when people in the relevant field start using the tool and start debating where it is useful and where it is not useful (or even harmful).

But even without shared standards and practices (which are likely necessary for corpus linguistics to fulfill its promise of predictability, as you note), one of the chief benefits of the corpus approach is transparency. When corpus linguists are wrong about ordinary meaning, they are transparently wrong, because their approach and their findings are replicable and falsifiable. This is in contrast to circumstances where judges merely state that “the ordinary meaning of X is Y” (or “the ordinary meaning of X is Y. See Dictionary Z.”) In their paper “Ambiguity about Ambiguity,” Farnsworth, Guzior, and Malani observe that “If one person says that both proposed readings of a statute seem plausible, and a colleague disagrees, finding one reading too strained, what is there to do about it but for each to stamp his foot?” Corpus linguistics in this case is an alternative to foot stamping.

One point of clarification, you mention frequency as a proxy for ordinariness (referring to the comparative frequency of competing senses of a word—e.g., the ‘carry in a car’ vs. ‘carry on your person’ senses of ‘carry’ from Muscarello). I don’t think anyone is advocating (I certainly don’t advocate) merely characterizing the most frequent sense of a word as the ordinary meaning. That would be arbitrary. But corpus linguistics can allow you to examine the way in which a word is used in a given syntactic, semantic, and (sometimes) pragmatic context, in the speech or writing from a given speech community or linguistic register, and from a given time period. To the extent that you find that a given sense of a word is overwhelmingly more common in a particular context similar to that of the statute, in a relevant speech community or register, and from a similar timeframe, I don’t think it is an extraordinary leap to conclude that the people subject to that statute would have understood the word in a way that is consistent with its most common meaning in those circumstances. This is a presumption that I would think should be rebuttable where there is compelling evidence that an alternative sense of the word or phrase was intended. And I don’t advocate (and I don’t think anyone in the pro-corpus camp advocates) foreclosing consideration of other evidence of meaning simply because the corpus data suggests a particular answer.

You state that you “would prefer that the people entrusted with deciding the scope of that language saw their task for the important check on government power that it is, rather than as bean counting.” I would point out that of the three criminal cases that my co-author and I examined in our most recent paper (i.e., McBoyle, Muscarello, and Costello), the corpus data that we examined favored the criminal defendant in all three cases. In McBoyle, there was no evidence supporting the notion that anyone would have ever (much less ordinarily) read the term ‘vehicle,’ in the National Motor Vehicle Theft Act of 1919 to apply to ‘aircraft.’ In Muscarello, the use of the verb ‘carry’ in the context of ‘firearm’ and its synonyms (both today and at the time the relevant statute was enacted) is much more commonly used in the ‘carry on your person’ sense. You mention notice. I would ask whether Muscarello had proper notice that he could go to jail for five extra years for ‘carrying’ a gun he had locked in his glove box. The corpus data at least suggest that he didn’t. With respect to Costello, we found that both senses of the verb ‘harbor’ (‘shelter’ or ‘conceal’) were roughly equally attested, but that in most cases it is impossible to tell which sense is implied. Which suggests that the court should have applied the rule of lenity. But we concluded that the Costello statue appeared to be ambiguous, only after examining relevant usage data.

As you know, Muscarello did get those extra five years for ‘carrying’ a gun he had locked in his glove box. The Muscarello court justified this sentence based, among other things, on the notion that the ‘carry in a car’ sense of ‘carry’ was listed ‘first’ in the OED and Webster’s Third, while the ‘carry on your person’ sense was listed ‘second.’ But the OED and Webster’s Third list their senses historically. In the OED, the first sense of ‘carry’ is listed first because it appeared in the English language in 1320 and the second sense is listed second because it first appeared in 1340. The court also cited the etymology, of ‘carry,’ noting that it came from the Latin ‘carrum’ which means ‘car or ‘cart.’ By the court’s logic, October would be the ‘eighth month.’

When you talk about judges’ important check on government power in criminal cases, that is quite honestly one of the reasons that I think empirical approaches to interpretation are important. In Muscarello the highest court in the land sanctions the state’s use of one of its most awesome powers—the ability to take away the freedom of one of its citizens—and (at least some of) the reasons that it offers for doing so are flatly erroneous and, I would argue, arbitrary. I would also argue that the provision of arbitrary justifications for the exercise of the state’s power over criminal defendants has important implications for due process. If it is between the Muscarello court's reasoning (which I would argue is not anomalous in ordinary meaning cases, even criminal cases) and bean counting, I’d take the beans.

One final point (and please forgive me for the lengthy post): You state that “CL will, at a minimum, push more cases to be resolved at the ‘plain meaning’ stage of textualist interpretation. I think this is on balance negative because it makes the courts' use of statutory canons that promote notice and accountability less likely.” I am not convinced that this is the case. Commenting on the related field of contract interpretation, Arthur Corbin said: “It is true that when a judge reads the words of a contract he may jump to the instant and confident opinion that they have but one reasonable meaning and that he knows what it is. A greater familiarity with dictionaries and the usages of words, a better understanding of the uncertainties of language, and a comparative study of more cases in the field of interpretation, will make one beware of holding such an opinion so recklessly arrived at.” I think that a greater the familiarity with data from linguistic corpora makes you more skeptical about claims of ordinary meaning. You may start to see alternative interpretations as viable possibilities because you can see alternative uses reflected in the data. Which may, in the end, result in an increased reliance on canons like lenity.

Posted by: Stephen Mouritsen | Sep 8, 2017 2:31:45 AM

When Obama said "you can keep your doctor" was that a term of art?

Posted by: Single Player | Sep 7, 2017 8:21:20 PM

Owen asks, "how often does statutory interpretation involve the interpretation of terms of art?" I don't know the answer to that, but I'm inclined to think that it happens less often than Owen thinks. Just think of all the cases that have been generated by 18 U.S.C. 924(c), involving the phrase "use or carry a firearm." And I would guess that a fertile source for ordinary-language words in criminal statute would be the phrases that denote the physical actions that are forbidden, and especially the verbs in those phrases. But that's just a guess.

A related point occurs to me. There are lots of statutes that use ordinary-language words that are so encrusted with judicial interpretations that it would be hard for any kind of new analysis to break through, corpus-based or otherwise. But in many of those instances, if you went back to the original appellate or SCOTUS precedents you'd probably find that the interpretative issue was framed as a matter of ordinary meaning.

In any event, as I've said in an earlier comment, I think the number of cases where corpus data would be useful isn't all that big.

Posted by: Neal Goldfarb | Sep 7, 2017 8:09:17 PM

Neal, thanks for the explanation. I think we agree that it doesn't work to apply the corpus linguistics approach to terms of art. The question then becomes, how often does statutory interpretation involve the interpretation of terms of art? Bringing this back to Carissa's original topic, criminal statutes, I would say it does quite often in that context. When drafting criminal statutes, it's really common to take preexisting components and assemble them in a mix and match fashion. There are only so many concepts to use to say what is a crime, and it's very common to take existing language and known phrases. I think that's why criminal law classes focus on the basic tools and concepts and usually only teach a few actual crimes: Once you know the tools and concepts, you can create and interpret any new criminal statute by just using off-the-rack components. There are some new words, but a lot of it is existing language repackaged. And if so, I would think that substantially limits the usefulness of corpus linguistics in the context of interpreting criminal statutes.

Posted by: Orin Kerr | Sep 7, 2017 7:09:28 PM

I think there are real concerns about the use of corpus linguistics in legal interpretation (although my concerns don’t match up with Carissa Hessick’s concerns). Stefan Gries (a linguist at UC Santa Barbara and an expert on corpus linguistics) and I have written two essays on the topic. One will be published in the BYU law review, and I'll post it on SSRN when I get permission from my co-author (the other paper will be published in the near future).

My concerns have to do with how to properly assess corpus linguistics findings and the ability of judges to do the work competently. I think the notice and accountability objections are unpersuasive, however. In the context of legal interpretation (and specifically statutory interpretation), corpus linguistics is designed to capture the ordinary meaning of the relevant language. In this sense, it's no different than using a dictionary (but provides a lot more information), or, for that matter, the judge’s own view of langauge. This is a separate concern from whether corpus linguistics can provide accurate information about ordinary meaning, which seems to be at the heart of Carissa Hessick's concerns. I'm not going to address accuracy in this post, but I will note that no one should argue that corpus linguistics by itself can resolve issues of statutory meaning. Corpus analysis can provide generalized information about language usage with some ability to account for sentential context (in contrast to standard dictionaries). Statutory interpretation is a much more nuanced process, though, that must account for various aspects of context specific to the provision at issue (e.g., whether a possible interpretation will create inconsistency between the provision at issue and surrounding provisions). I wouldn’t call using generalized information about meaning “cherry-picking,” although, like other sources of meaning, corpus analysis can be done poorly.

Posted by: Brian Slocum | Sep 7, 2017 5:33:22 PM

I detect a certain amount of irritation in Orin's response to me, which is understandable considering that I may have come off as a little obnoxious. Apologies for my tone. My intention wasn't so much to criticize Orin for his post as to encourage people who are interested in the issue to read the work that's out there (which at this point isn't very extensive).

In particular, I think it's more important for people to read actual examples of corpus analysis in the legal context than to read discussion *about* the use of corpus linguistics. (Most papers that undertake the former also include the latter, but not vice versa.) The discussion about using corpus linguistics in legal interpretation ought to be grounded in an understanding of what corpus analysis actually entails, rather than merely assumptions and beliefs.

As for my disagreement with the substance of what Orin said: I will deal with the issue of legal usage versus ordinary usage in the more complete writeup that I will be doing. For now, I’ll just say that corpus linguistics obviously isn’t an appropriate tool to use when dealing with legal terms of art (either terms that have only a legal meaning or ordinary words that have a specialized legal use).

In light of Orin’s response to Asher, what I’ve just said might well narrow the area of disagreement between Orin and me. In any event, putting aside the issue of specifically legal terminology, I see no reason to think that language in statutes would be less amenable to corpus analysis than language in the Constitution. That’s not to say that corpus linguistics will be useful in all cases. Quite the contrary; my sense is that the number of cases in which it would be useful is fairly small.

Moving on from Orin’s comments to Asher’s: I agree with much that Asher says, including the concerns that he raises about the way corpus linguistics has been used. I’d be interested in knowing what work he’s referring to, and what flaws he sees in it—even if it’s my work that he’s talking about.

However, I have a bone to pick with what Asher says in his final paragraph about “public intuitions on the meaning of language.” If the intuitions he’s referring to (and that Carissa has referred to) are the answers that people would give if they were asked about the meaning of a particular bit of text, I don’t think that’s what is relevant for legal interpretation. The aim of interpretation, as I see it, is to determine how the text is likely to be understood, and the process by which people arrive at their understanding of texts (and of oral statements) occurs almost entirely below the level of conscious awareness. When people engage in explicit interpretive analysis, or even just state their beliefs about an interpretive issue—it’s at *that* point that their intuitions can become unreliable. And the point of using corpus linguistics is not to try to determine public intuitions of that sort. Rather, it is to try to make a better finding as to the understanding that would be expected to emerge from the automatic process of comprehension. (For more on the difference between comprehension and explicit interpretation, see my post here: https://lawnlinguistics.com/2017/05/11/comprehension-ordinary-meaning-and-linguistics/)

Posted by: Neal Goldfarb | Sep 7, 2017 3:29:54 PM

I appreciate Asher's comments. As I note in the paper, there are reasons to think that the corpora will not be particularly good at capturing public intuitions about meaning, but rather that copora will systemically over- or under-represent certain meanings.

Of course, his next argument is that, even if corpora searches are not that great at showing public intuitions about language, they will still be better than relying on the intuition of judges. That's obviously an empirical question that neither he nor I can settle on this blog.

But his arguments also seem to assume that the only acceptable meaning for a statutory term is the one that coincides with shared public intuitions. As I explain in the paper, I don't agree. I think that judges bear a constitutional responsibility in the construction of language--especially language in criminal statutes--and for centuries that responsibility included far more than simply giving words their shared public meaning. Corpus linguists and at least some textualists will disagree with me on that. But in so doing, they should recognize that their view of the appropriate judicial role is a relatively recent modern construct.

Posted by: CBHessick | Sep 7, 2017 2:01:23 PM

Asher, interesting point. For better or worse, when I wrote that above, I was thinking of the common use of terms of art in statutory drafting. A drafter will often take legal phrases and words with a known technical legal meaning, and it's understood that the meaning of the word or phrase has its technical legal meaning. For example, if a 1996 federal statute says that it's a crime to do X when it is "affecting interstate commerce," you wouldn't ask what the public in 1996 thought it meant to affect interstate commerce. Instead, you would recognize the use of a legal term of art and give it that known meaning. I think of that as a textualist thing to do, see, e.g., Dir., Office of Workers’ Comp. Programs v. Newport News Shipbuilding ., 514 U.S. 122, 126 (1995) (Scalia, J.) (“The phrase ‘person adversely affected or aggrieved’ is a term of art used in many statutes to designate those who have standing to challenge or appeal an agency decision, within the agency or before the courts.”), but maybe it's not best understood that way or is too quirky an example. Interesting point to think about, thanks.

Posted by: Orin Kerr | Sep 7, 2017 1:57:22 PM

First of all, I think Orin's just mistaken in believing that textualism isn't about reconstructing an enactment-date public meaning; I think that's just what it's about. The main difference between originalism and textualism is that one is a methodology of constitutional interpretation and the other's a methodology of statutory interpretation. There's a reason that especially rigorous textualist judges, or ones who fancy themselves rigorous anyway, increasingly rely on enactment-era dictionaries instead of new ones.

I have problems with corpus linguistics; I think it tends to be misapplied, that people misframe the questions they put to the corpora, as it were, that they misinterpret their findings, and that the fact that a usage is extremely rare doesn't necessarily tell us anything about whether it's the best reading of a word or phrase in a particular context. I tend to believe that corpus linguistics is useful mostly to decide whether a reading is even permissible, but that frequency rates as between usages aren't very instructive.

That said, in theory, I think your notice critique of CL is a misfire. CL aims to capture how ordinary people use language; if done right, no one should be surprised by the results CL generates because it will have merely captured how the criminal himself is accustomed to hearing phrases like "carry a firearm" used, more often than not. Of course, many criminals may not read the journals and newspapers captured in the corpora; many drug dealers probably don't use "carry a gun" often and are more likely to use more idiomatic phrases like "I have a gun on me" (which also, in spite of what it appears to literally mean, may include having a gun with you in the car). But there's not much we can do about the problem of statutory language that the regulated aren't familiar with.

Now, I know that you anticipate this response in the paper and say that if all corpus linguistics does is capture intuitions, why replace judicial intuition with corpus linguistics? I think this is also a misfire. Besides the logically possible possibility that there's something systemically awry about judicial intuitions, CL's basic claim, it seems to me, is that the intuitions of a few individuals, representative or not, are a poorer gauge of public intuitions on the meaning of anguage than a large survey of public use of that language. Public intuitions in the aggregate about what a phrase means are more likely to correspond to actual usage, broadly measured, than three people's sense of usage. Of course, the best way to measure public intuitions would be to survey the public on their intuitions, but in theory the next best thing is to look at a huge sample of usage that reflects and informs those intuitions. Asking one or three or nine randomly selected and possibly not very representative people what they think public intuitions are on the basis of their own, on the other hand, is an extraordinarily poor way of measuring public intuitions, like attempting to guess if more Americans think a tomato is a fruit or a vegetable by asking three people on the street. You could get a much more accurate sense of the matter - in fact, you could get a much more accurate sense of what any given criminal defendant is likely to think - by reviewing data on how tomato is used in conjunction with fruit and vegetable.

Posted by: Asher Steinberg | Sep 7, 2017 12:13:16 PM

Neal writes:

"I realize that Orin was just offering some tentative thoughts rather than firm conclusions, so perhaps I'm being unfair to him. However, if legal academics are going to start paying attention to corpus linguistics (as I think they should), there is some homework that they will need to do, so that they will know whereof they speak."

Neal, my primary exposure to the "corpus linguistics" concept was in reading and then commenting on Jennifer Mascott's article, Who are 'Officers of the United States'?, forthcoming in the SLR, which Jennifer presented last year at GW. It seemed to present a good overview of the field, and it led to (what I thought was) a useful discussion at the workshop of the strengths and weaknesses of the CL approach. Having put a few hours into thinking about the approach, I thought I might have some modestly helpful thoughts to add to the discussion.

More broadly, if you think I am wrong, can you say why? I realize you think it would be helpful for me to "do my homework" and read your scholarship before commenting on the field. But given that readers may be reading the thread for insight, and I am hoping to improve my own understanding, I think it would be helpful to hear the counterargument rather than be warned about commenting without sufficient familiarity with your work.

Posted by: Orin Kerr | Sep 7, 2017 11:27:52 AM

Carissa, thanks for the quick response to my comments. I've had a chance to read quickly through your paper, and I plan on writing up my thoughts later today. But for now, just a few quick points about what you say in your response to me.

First, although you stress that your objections relate only to using corpus linguistics in the criminal-law context, it seems to me that your linguistically-oriented objections (e.g., re what you see as the inherent problems with frequency data) are would apply across the board.

Second, I tend to think that when you impute a radical agenda to the advocates of using corpus linguistics (and more specifically, to Lee and Mouritsen, who seem to be your primary targets), you are reading too much into their work. I don't think that they are trying to remake legal interpretation or that using corpus linguistics would have the effects you are afraid of.

Third, whether or not Lee & Mouritsen (or Slocum & Gries) would agree with your characterization of what they're trying to do, I don't think that your characterization can be generalized to apply to everyone who advocates using corpus linguistics. I certainly don't think that it reflects where I'm coming from in my own work. My goal is to get lawyers and judges to do a better job of analyzing issues of ordinary meaning, not to remake the theoretical framework of legal interpretation. And given that ordinary meaning is important in all approaches to interpretation, one doesn't have to be a textualist to think that corpus linguistics can be useful. Nor, IMO, is the use of corpus linguistics is inconsistent with principles such as lenity and clear-statement requirements.

More later. In the meantime, while I disagree with what a lot of what you've said, the issues you've raised are important for advocates of corpus linguistics to address.

Posted by: Neal Goldfarb | Sep 7, 2017 9:34:05 AM

I'm happy to see so many people commenting on this post.

A few words about my "pessimistic view" of corpus linguistics:

First, both my essay and my post are meant to raise concerns about the use of corpus linguistics to interpret *criminal* laws. I have not really thought about the use of CL to interpret tort laws. My concern about notice does not apply with equal force to non-criminal statutes. I suspect that my separation of powers concern does not either. And all of my concerns may not apply to interpretive inquiries conducted under Chevron, where the question is whether an interpretation is reasonable or unreasonable.

Second, even though the interpretation of criminal laws makes up only a fraction of the statutory interpretation cases on the federal and state dockets, criminal laws play an out-sized role in the corpus linguistics literature. Muscarello, which Neal Goldfarb mentions, is one example. Smith (what it means to "use" a firearm) and Costello (what it means to "harbor" an alien) also appear in multiple articles touting corpus linguistics. I infer from these articles that corpus linguists see no problem with employing their methodology in the interpretation of criminal laws--in fact, they advocate for it.

Third, I am sympathetic to Orin's point that perhaps we ought to view CL as a tool that could simply be of marginal assistance, rather than a tool that is supposed to provide definitive answers. That was my initial impression as well. But that impression has changed over the past several months.
For one thing, as I explain in the paper, CL will, at a minimum, push more cases to be resolved at the "plain meaning" stage of textualist interpretation. I think this is on balance negative because it makes the courts' use of statutory canons that promote notice and accountability less likely.
For another thing, folks like Neal who are pushing for CL, promote it because CL "can have a very significant impact on issues of statutory interpretation." In fact, they say that the "ordinary meaning" inquiry under textualism should be understood as an inquiry about frequency.
For a third thing, if CL is just one more data point, then that reintroduces the very flaws that CL purports to correct in the current system--that judges can cherry pick evidence to support their decisions.
And finally, I have heard more than one CL proponent say that this method is desirable because it cabins judicial discretion. In other words, they like CL not because it adds to a judge's data points in making these decisions, but rather because it minimizes the role of a judge's own judgment in these decisions.
So maybe it would not be a big deal if more practitioners started including corpus searches and analysis in their briefs and if judges felt as though they cold take or leave that information. But I see the people who are advocating the adoption of corpus linguistics to be pushing for a significant change in how we approach statutory interpretation in this country. And I think that change would leave us considerably worse off.

Posted by: CBHessick | Sep 7, 2017 8:45:14 AM

As someone who attended the BYU conference that Carissa refers to, and presented a paper there, I look forward to reading her paper. My initial reaction, based on this post, is that she's taking an unduly pessimistic view of what corpus linguistics can offer. However, I'm going to wait until I've read her paper before commenting on the substantive issues.

But I do want to raise an issue that I think is important. I am concerned about people drawing conclusions about the use of corpus linguistics in law without really knowing much about the subject. That was certainly a problem in the comments on the Lee/Mouritsen guest posts at Volokh Conspiracy. And some of what Orin says -- for example his statement that corpus linguistics would probably be of only marginal usefulness in statutory interpretation -- makes me wonder whether he's familiar with the scholarship dealing with that issue. The scholarship I'm thinking of is primarily the corpus analyses of the "carrying a firearm" issue (Muscarello v. U.S.), first by Stephen Mouritsen and more recently in the paper I presented at BYU. Also, in the domain of litigation rather than academia, the brief that I filed in the Supreme Court in FCC v. AT&T is also relevant. (See links below.)

Although I'm obviously not completely objective about my own work, I think that it shows (as does Mouritsen's) that corpus linguistics can have a very significant impact on issues of statutory interpretation. And of the limited attention that corpus linguistics has gotten in the courts so far (Justice Lee in Utah and the Michigan Supreme Court), it has all involved statutory issues. So what Orin said strikes me as counterintuitive.

I realize that Orin was just offering some tentative thoughts rather than firm conclusions, so perhaps I'm being unfair to him. However, if legal academics are going to start paying attention to corpus linguistics (as I think they should), there is some homework that they will need to do, so that they will know whereof they speak.

Links to the materials I've mentioned are available here: lawnlinguistics.com/2017/05/08/corpus-linguistics-coming-to-the-sixth-circuit-bench-plus-lawncorpusling-roundup/

Posted by: Neal Goldfarb | Sep 7, 2017 2:50:58 AM

That was supposed to read "a more fluid area of law, like Tort law..."

Posted by: Adam Zimmerman | Sep 6, 2017 11:41:39 PM

Thanks for sharing this--looking forward to reading your essay. Reading your post, I wondered whether you would have a different perspective if the subject matter wasn't criminal law -- which often raises the very statutory and Due Process notice concerns you describe. In a more area of law, like Tort law, some courts have comfortably adopted broad standards, and even changed course,(shifting to comparative fault and retroactively imposing strict liability without overly worrying about notice. Seems like linguistic analysis could help reveal new insights about where common law is going in areas historically driven by judge-made law.

In that regard, Cristina Tilley has an interesting linguistic analysis of the Restatement of Torts in the Yale Law Journal. She provocatively argues that this "inside out" approach to Torts yields new theoretical and practical insights. Among them: that the principle focus of the Restatement isn't morality or efficiency, but "community," and that this can inform the development of different areas of intentional and strict liability law. You can find it here: http://www.yalelawjournal.org/article/tort-law-inside-out

Posted by: Adam Zimmerman | Sep 6, 2017 11:39:38 PM

Very interesting discussion. Having read your essay, as well as some of the blog postings earlier by Justice Lee and Stephen Mouritsen on this topic at the Volokh Conspiracy, it seems to me that some of the issues being discussed are due, at least in part, to the nature of the corpora being used as opposed to the methods. If I understand correctly the corpora are large general purpose databases developed historically for linguistic research. But there doesn't appear to be any reason why tailored corpora couldn't be used relatively easily today.

Here in my state of Virginia, as in many places, I can download state supreme court decisions from the court's website. It is then fairly straightforward to turn those opinions into a corpus using fairly simple code on a desktop computer. Words can be tokenized, counted in multiple ways, and tagged using any number of recognized schemes. And now the frequency, meaning, or other analysis is firmly grounded in the precedent and usage of the court itself. While perhaps the average judge or lawyer may not have the skill to perform this analysis, it is relatively straightforward and I would think within the means of many courts to develop a standardized set of corpora, reducing the need for competing experts but still allowing Judges access to meaning beyond that gleaned from dictionaries and legislative records.

This doesn't address all objections of course, but I think it is interesting that text analysis is moving towards a much more accessible model leveraging a lot of academic and open source work, and reducing reliance on large pre-coded databases (while obviously owing a great debt to that work). I would imagine that sooner or later this will make its way into courts in some form or another.

Posted by: Kyle | Sep 6, 2017 6:52:26 PM

There is definitely a difference between what the words mean (constitutional interpretation) and what the scope of the right is (statutory interpretation).

Whether something counts as an "imminent danger" (scope) will be a subjective matter of statutory interpretation much more than whether something counts as speech/expression to begin with.

Judges' job is to come to a conclusion about the scope of the right, but they should have a more objective way to come to the meaning of the words to decide if there is a case to begin with (to challenge a law).

Posted by: Scope or Listerine? | Sep 6, 2017 5:41:12 PM

I agree that usage frequency can be a problematic thing to measure. That's true for a bunch of reasons, including that a particularly common usage in public discussions may be different from what seems like a typical usage in the course of legal drafting. With that said, I take the corpus linguistics methodology to be a "maybe this might be of marginal assistance" tool rather than a "this will identify the correct answer" tool. If that's an understood limitation, then I think the corpus linguistics approach can be of some assistance in some cases. My sense, at least.

Posted by: Orin Kerr | Sep 6, 2017 3:10:04 PM

Thanks for the comment, Orin. I think that is is fair to treat questions of constitutional interpretation separately from questions of statutory interpretation--especially the interpretation of criminal statutes.
But as I mention in the paper, I am wary of assuming that frequency of usage gives us an accurate picture of linguistic prototypes. Maybe newspapers and other publications were more representative in the late 18th C, but nowadays frequency will be skewed by newsworthiness . . .

Posted by: CBHessick | Sep 6, 2017 2:32:52 PM

Interesting post, Carissa. I would think corpus linguistics has value to an originalist engaging in constitutional interpretation but less value to a textualist engaging in statutory interpretation. An originalist engaging in constitutional interpretation may want to know the original public meaning of a particular word or phrase, which is generally a matter of public usage in some time in the past. A study of public sources from that period might help illuminate that meaning to a modern reader, and a more rigorous study of those sources might give the modern reader greater confidence that a particular interpretation was the one the public would have taken at the time. On the other hand, I don't think of textualist approaches as generally involving the same act of reconstructing a public meaning. I think of textualism more a matter of understanding how a careful law-trained reader of the statutory text -- one versed in precedents and court rulings about that phrase -- thinks that language means. Given that, I'm less sure that a corpus linguistics approach can shed light on the meaning of a statute.

Posted by: Orin Kerr | Sep 6, 2017 1:46:28 PM

The comments to this entry are closed.