« 4th Annual Administrative Law New Scholarship Roundtable | Main | Misquoting Churchill on Democracy »

Wednesday, January 30, 2019

Corpus Linguistics Comes to the Fourth Circuit (and that’s not a good thing!)

An amicus brief was filed yesterday in the Fourth Circuit Emolument Clause case against President Trump.  The brief was filed on behalf of Clark Cunningham, who is a law professor at Georgia State, and Jesse Egbert, who is a linguistics professor at Northern Arizona University.  The brief states that Professors Cunningham and Egbert have undertaken a “scientific investigation” to conclude that the word emolument did not have “a distinct, narrow meaning limited to ‘profit arising from an office or employ.’”  Some people on the platform-that-must-not-be-named are circulating this brief and its conclusion, presumably because it supports the challenge to President Trump’s business practices under the Emoluments Clause.*

So what is this “scientific investigation” that the professors undertook?  Did it involve a microscope?  Or double-blind clinical testing?  Nope.  It involved the use of corpus linguistics --- a new approach to statutory interpretation that I’ve criticized before, both here on this blog, and also in this short essay.

I will not rehash my arguments about why I’m deeply distrustful of corpus linguistics as a tool for statutory interpretation—especially the interpretation of criminal laws.  But I do want to say a few words about characterizing corpus linguistics as “scientific investigation” and why I am concerned that a law professor would allow his corpus linguistics analysis to be characterized that way.  (Note:  The amicus brief was written by an attorney, but Professors Cunningham and Egbert were the clients.  So I am assuming that they had the ability to object to the use of the phrase “scientific investigation.”)

For one thing, using the phrase “scientific investigation” connotes that the professors conducted an experiment, that the results of that experiment were objectively observable (rather than mere subjective impressions), and that the findings can be replicated.  This is reminiscent of claims by others who advocate for the use of corpus linguistics in statutory interpretation because those “findings are replicable and falsifiable.”

But corpus linguistics does not allow you to type a word or a phrase into a computer which spits out an answer to the question of meaning.  At best, corpus linguistics allows other people to replicate your search of a corpus linguistics database,** but it does not allow them to replicate your findings.  That is because the findings of a corpus linguistics analysis require inference and interpretation.  I’ve made this argument before (using a case called Rasabout as my example).  But the subjective judgment required is on stark display in this brief.

Among other inferences, Professors Cunningham and Egbert conclude that, because the word “emolument” was often modified by the word “official,” that means the word “emolument” when it appeared without modification was generally understood to mean something broader than “profit arising from office.”  If everyone would have understood the term “emolument” to be limited to profits from holding office, so their argument goes, then “official emolument” would be an oddly redundant phrase.  It is for similar reasons, Professors Cunningham and Egbert explain, that we don’t often see the word “fork” modified by the word “metal”—we generally assume that if someone is referring to a fork, then he or she is referring to a metal fork.

This analysis by Professors Cunningham and Egbert may seem perfectly logical.  And you may even be convinced by it.  But the fact that something seems logical does not mean it is “scientific.”  To the contrary, many things that appear logically true end up being empirically false.  Once you have to rely on inferences to derive "findings" from your results, you have left the world of objective truth and moved into the realm of theory. 

There is a second reason why I am especially troubled by the use of the term “scientific investigation” to refer to a corpus linguistics analysis:  It allows Professors Cunningham and Egbert to dismiss opposing views as a failure of “scientific method.”  You see Professors Cunningham and Egbert are not the first academics to undertake a corpus linguistics analysis of the word emolument.  There is a 2017 article by Phillips and White in the South Texas Law Review, which uses corpus linguistics to arrive at the exact opposite conclusion.***  The amicus brief cites to the Phillips and White article, but rather than engaging with the article on the merits, it dismisses the article because it relies on an assumption that (according to the amicus brief) “has no scientific basis” and that is “disproved by the linguistic research reported in this brief.”  The amicus brief goes on to state: “Although Phillips & White subtitle their article ‘A Corpus Linguistic Analysis,’ none of their conclusions about the 18th century meaning of emolument are based on the scientific methods used for the research reported in this brief.”

I find these characteristics of the Philips and White article to be at best misleading, and arguably false.  First, Cunningham and Egbert have “disproven” nothing.  They arrive at an opposite conclusion because they use different assumptions and make a series of inferences.  That’s called “disagreeing” with someone.  But, of course, to say that you “disagree” with someone has far less rhetorical force than to say you have proven them wrong.

Second, Cunningham and Egbert suggest that Phillips and White didn’t use the “scientific method” of corpus linguistics. That suggestion is false.  There can be no doubt that Philips and White use corpus linguistic analysis.  They don’t simply refer to the method in their subtitle; they provide a methodology section in the article, and they explain how they coded the results of their corpus search to arrive at their conclusions.  So I do not see how the statement that the Phillips and White conclusions are not “based on the scientific methods” used in the amicus brief can be true.  Perhaps Professors Cunningham and Egbert might say that their corpus linguistics methodology was superior.  If so, the brief should make that argument, rather than suggesting that their analysis was scientific and the Philips’ and White’s analysis is not.

Which brings me to my major objection to this brief:  It is claiming a mantle to objective truth, when in reality it is recounting a subjective analysis about which reasonable people can – and do – disagree.  To be clear, Professors Cunningham and Egbert are not the only people using corpus linguistics who suggest that the methodology promises to eliminate judicial discretion and make statutory interpretation an “objective” undertaking.  But this brief is a particularly crude example of that argument.  And given that the brief was filed by academic experts in order to aid judges in this case, I find the crude and misleading analysis to be especially troubling.  If law professors (and other professors) are going to file amicus briefs based on their academic expertise, they should be careful not to mischaracterize or mislead.

Let me close by saying that, although I am a corpus linguistics skeptic, there are thoughtful people on their other side of that issue.  People who I respect (but disagree with) think that corpus linguistics can be a helpful and valuable tool for judges in the interpretation of statutes and Constitutions.  I hope that those thoughtful scholars will affirmatively disclaim the use of the word “science” to describe the endeavor and criticize any briefs that present misleading accounts of what a corpus linguistics analysis can prove or disprove.


* To be clear, the brief takes no position on “the ultimate resolution” of the case.  But the arguments in the brief support those challenging the President.

** Interestingly, the search conducted by Professors Cunningham and Egbert for this brief is not necessarily replicable.  Footnote 21 of the brief tells us:

The researchers’ search can be approximately replicated by entering “emolument*” in the initial search box that appears after logging into COFEA. The use of the asterisk produces every word containing the string of letters that precede the asterisk. The researchers corrected the raw results of their COFEA search by looking for and adding texts that contained variant spellings or typographical errors that were missed by the initial search and also deleted identical texts, for example texts that appeared in two different source materials.

*** In an interesting twist, one of the two authors of this law review article, James C. Phillips, is a former co-author of Professor Egbert.  And Egbert touts his article with Phillips as one of his qualifications in FN 10 of the amicus brief.

Posted by Carissa Byrne Hessick on January 30, 2019 at 12:26 PM in Carissa Byrne Hessick | Permalink


The corpus-linguistic definition of certain rights and privileges, shall not be construed to deny or disparage others retained by the people.

Posted by: Nine | Feb 3, 2019 1:54:51 PM

I'm sympathetic to Carissa's concern. Over a hundred years ago, the pursuit of explanation, almost all of which fell into one kind of "philosophy" or another began to separate into disciplines, often divided into those which were "nomological" (governed by laws) and those which were hermeneutical (the subject of narrative and interpretation). I understand corpus linguistics to be the attempt to take a field heretofore on the hermeneutic side of the divide and try to make it nomological.

The problem isn't the analysis itself; it's the attempt to claim a privileged status for the knowledge by tossing it into the science bucket rather than the narrative bucket. On that, the jury is still out. I wrote about this (gulp!) ten years ago in connection a critique of Ron Gilson's iconic 1984 piece justifying the presence of lawyers in large transactions (which Ron took with his usual good humor and open-mindedness).

Here's what I said then: "Robert Ellickson described the issue charitably as 'creative tension between the yin of social-scientific universalizers and the yang of humanistic particularizers.' Gilson, I think it is fair to say, expects that his explanation is an instance of social-scientific universalizing. My suggestion is that it is at best on the borderline of science. To resolve that issue, we need to revisit the demarcation issue: what distinguishes science from pseudo-science? Put another way, I am arguing conceptually that there is simply no way of ever proving or disproving the theory, and as such it loses its privileged status as a way in which scientific knowledge has progressed. As a way of making sense or explaining, it is no better-and perhaps worse-than cultural studies or hermeneutics (at least to the extent that those disciplines have not claimed privileged status for themselves as against other disciplines). Again, I repeat, this is not a criticism of the creativity or the brilliance or the power of Gilson's explanation; it is merely a denial of its privileged status as scientific truth."

FWIW, my essay is here: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1303483

Posted by: Jeff Lipshaw | Feb 2, 2019 10:28:02 AM

Many thanks to those who have commented.

I wanted to take a quick moment to respond to Paul Gowder's point that I'm using too narrow of a definition of the word "science" in my critique of this brief. I believe it is fair to summarize his argument as follows: A scientific inquiry need not involve a microscope or a clinical trial, and the term science, especially when used by philosophy of science folks, means something much broader. That broader meaning could include the corpus linguistics search and analysis in this brief. (Rob Anderson made a similar point on Twitter.)

First, let me be clear, I did not intend to take a position on what constitutes "science." I am not a philosopher of science, and I have no idea how a philosopher of science would define that term.

Instead, what I am trying to do is to explain why I think that calling what Cunningham and Egbert did a "scientific investigation" in an amicus brief is highly misleading. I'm basing my claim that it is misleading both on what I imagine the term "scientific investigation" is likely to connote to appellate judges, as well as the context of how that term was used in this amicus brief.

My intuition is that, if someone uses the term "scientific investigation" in ordinary conversation or in a brief about legal meaning of a word, it is likely to connote a particular type of undertaking. In particular, it is likely to connote a search for empirical truth, and that the results of the investigation are both quantifiable and verifiable. Does that mean that non-science philosophers think that "scientific investigations" are limited to undertakings that involve microscopes or clinical trials? Of course not. But those are two examples of the types of experiments that are likely to be conjured in the minds of an ordinary listener (or an appellate judge) by the term "scientific investigation."

But, more importantly, what is the term "scientific investigation" being used to refer to in this brief? It's not simply the typing of the word "emoluments" into a database and presenting the raw data returned. It is *also* the inferences that are made about the *legal meaning* of the word.

Maybe a philosopher of science would say that inferences being drawn from results of a study are still part of the "scientific investigation." But I don't think that is how a lawyer (or a judge) would characterize those inferences. To use a different example, would a lawyer refer to his or her synthesis of cases she found from a Westlaw search as a "scientific investigation"? Would a judge refer to her assessment of how a particular word is used elsewhere in a statute as a scientific investigation? I don't think so. I think lawyers and judges would call those undertakings "legal reasoning" or something similar. And while a philosopher of science might disagree about whether these analyses count as "science," my claim is that the phrase, as used in the brief, was misleading for the judges sitting on that case.

What is more, Cunningham and Egbert don't simply claim that what they did was a "scientific investigation," they also claim that their analysis "disproved" a corpus linguistics analysis that came to another conclusion. As I argue in the post, Cunningham and Egbert do nothing of the sort. They conducted a database search and drew inferences from those results that conflicted with the other corpus linguistics analysis. This "disprove" language again suggests that there is a "right answer" out there and that their experiment can "prove" someone else's study to be "wrong." That sort of language goes well beyond the idea of legal analysis (which is what Cunningham and Egbert's conclusion is based on). I would never, for example, say that my Westlaw search "disproved" your argument about statutory interpretation. But I would say that my flying to the moon, taking a soil sample, and bringing it back to a laboratory for testing "disproved" your theory that the moon was made of green cheese.

People are obviously free to disagree with my intuition about whether the 4th Circuit judges are likely to be misled by this brief. But my concern extends beyond this particular brief to claims by others who think we ought to use corpus linguistics for legal interpretation. Some who advocate for corpus linguistics do so because, according to them, the meaning of a statutory term is an empirical question and because corpus linguistics provides an objective answer to that question. I think that is demonstrably false. And my concern about referring to a corpus linguistics analysis as a "scientific investigation" is being expressed against the backdrop of this broader concern about corpus linguistics.

Posted by: carissa | Jan 31, 2019 11:17:58 AM

Concerning my first comment,here is a great illustration:

justice Alito in classic work in filling the gap by using extra textual sources when there is an impasse or ambiguity in legal text.very recommended(a ruling dealing with servicing in accordance with the" Hague convention ")here:



Posted by: El roam | Jan 31, 2019 3:21:17 AM

I'm with Carissa in thinking that this approach is nonsense and I'm frustrated that scholars gravitate toward interpretive methods that seem legitimate because they deal in quantification. Many things in life cannot be quantified. Sorry. In my opinion, this is one of them. It has a veneer of authenticity because of the number-crunching. But common sense matters too. And if common sense tells us that numbers just couldn't possibly give the right answer, we should give credence to that voice in our heads.

On the other hand, it's totally possible that this is "scientific." I've found that the deeper you dig in any field the more you realize that it's not all that rigorous or methodologically sound. There are a lot of wacky ideas underlying every field -- or every field I've studied. So I again resort to my common sense meter.

Posted by: anon, good nurse | Jan 30, 2019 11:52:48 PM

Opening caveat: I'm not terribly convinced by the argument in the brief. So, not trying to defend it here. And, in particular, I totally agree that it's unacceptable practice to casually characterize the other side in a narrow methodological dispute as unscientific and therefore Naughty.

That being said, I think you have a conception of "science" that is far too narrow, and misses how real-world scientists work as well as how philosophers of science use the term. The scientific enterprise, properly conducted, doesn't claim to have unvarnished access to objective reality or to be free from "inference and interpretation." (And it certainly doesn't have to involve a microscope or double-blind clinical testing! Observational rather than experimental research exists in any scientific discipline involving humans, and many others besides.) Inference and interpretation and subjective judgment are built into every step of the ordinary conduct of science, from the framing of hypotheses through the selection of statistical methods to use and the interpretation of results; as Quine would remind us, observation itself is theory-laden, and certainly the ultimate integration of research results into a body of theory requires subjective judgment about stuff like the weight of potentially conflicting evidence. Perhaps the ultimate integration of subjectivity into the scientific process is the entire discipline of Bayesian statistics, which posits that the researcher is beginning from a prior representing their own (theoretically informed, to be sure, but still at least partly subjective) probability distribution over states of affairs before looking at the data.

So criticize the amicus brief for being full of the equivalent of google searches sold as a careful research methodology, and I'm with you all the way. But let's not take unsustainable positions as to the demarcation criteria for the scientific enterprise.

Posted by: Paul Gowder | Jan 30, 2019 9:13:16 PM

It would have been nice if you knew or had read up on what "science" actually is, because then you would have realized that much of corpus linguistics research meets the definitions of "science" and the "Scientific method"; even checking those articles on Wikipedia (sections on 'branches of science' or 'scientific research' of the article on "science" and par. 1 of the article on "scientific method") would have shown you that as well as that the use of researchers' judgment and interpretation does not make something non-scientific. Sorry, but if you try to move outside your area of expertise - law - fine, but then you better know what you are talking about.

Posted by: Anonymous | Jan 30, 2019 5:19:49 PM

The rise of originalism and corpus linguistics appears to coincide perfectly with the switch from a protestant court in the Warren era to a Catholic court in the Roberts era.

The Warren Court used the living document theory that protestants use to interpret the gospels, whereas the Roberts court uses the textualist theory that catholics use to interpret the gospels.

Just like protestants, the Warren court didn't need to refer to anything other than the justices own sense of justice and due-process to justify their rulings; whereas just like Catholics, the Roberts court needs to constantly refer to the text and the founders for any sense of justice and due-process.

Posted by: Pope Hat and Gloves | Jan 30, 2019 4:55:50 PM

I think you are giving too much weight and implying too little creativity and interpretation to the scientific method. For instance, there is no doubt whatsoever that the analysis of ice cores and tree rings to reconstruct past climate records. Yet here too interpretation is key to reconstructing the correct climate records and all sorts of additional assumptions and corrections are necessary. Here too the experiments are in some sense not repeatable. You can drill a new core like you can consult another corpus but there is only so much old ice.

And of course you can always call those who disagree with you scientifically unjustified. But that’s just a fancy way of saying you think it’s a scientific matter and you think they are totally wrong.

The problem here seems more to be about not seeing science as continuous with other means of reaching justified conclusions not in calling this science.

Posted by: Peter Gerdes | Jan 30, 2019 4:27:15 PM

Corpus linguistics seems to be a response to democrat judges refusal to apply the same rules that they apply to the rest of the bill of rights to the second amendment.

Everybody agrees, for example, that to find the meaning of the sixth amendment, we can look to the English bill of rights or the colonial bills of rights; but if we want to find the meaning of the second amendment, well that amendment is completely unique in all of Anglo-Saxon history and does not stem from the English analogue or colonial analogue, and therefore we need corpus linguistics.

When judges simply strike down gun-control laws the same way under the same reasoning that they do for speech-control laws, or voting-control laws, or abortion-control laws, corpus linguistics will vanish.

Notice that we don't use corpus linguistics to interpret Shakespeare or Chaucer even though they're 2-3x older than Madison?

Posted by: Common Core-pus | Jan 30, 2019 4:26:26 PM

In this regard, one may find great interest in that : " Act to amend Tenessee code Annotated...",which reads so ( Section 1(b):

(b) As used in this code, undefined words shall be given their natural and ordinary meaning, without forced or subtle construction that would limit or extend the meaning of the language, except when a contrary intention is clearly manifest.

I shall illustrate some more, maybe later...

Here to the bill at the time:



Posted by: El roam | Jan 30, 2019 4:16:57 PM

Sure Carissa.... Interesting post,but I don't see here great deal. A judge must use linguistic tools, in order to reach solutions or prevail in a case. The starting point, must always be the text, the language. When facing ( as frequently indeed ) impasse or ambiguity, one needs to shift to the legislator intent. So, practically the problem typically is that one :

That one can't always find match or harmony between :

The language, and the subjective or narrow intent of the legislator, and : the objective intent of the legislator ( adjusted or in light of legal harmonization, constitutional principles, and the concrete case the judge faces ).

But, the ultimate goal, is always the intent. For, there is no law or provision without intent.Everything finally, serves the purpose, the the intent of the legislator ( whether the objective,or subjective ). So, everything serves the purpose finally, the intent.

So, in that current issue of emolument :

Clearly, the intent is to avoid conflict of interests ( between the official duty of the president to serve honestly the public, and third parties, that may try to bribe him in a way or other, to serve their narrow specific interest, and not the public interest ).

In light of it, everything must be prevailed.

P.S : and of course, sometimes, or, too many times, linguistic or ordinary meaning has no meaning, for, the statute would define the narrow legal meaning of one word.


Posted by: El roam | Jan 30, 2019 2:19:28 PM

Thanks, El roam. The link should be fixed now.

Posted by: carissa | Jan 30, 2019 1:23:47 PM

Carissa,It seems that the link to the amicus is broken. Check out this one,looks the right one,and fix it if you want:


Posted by: El roam | Jan 30, 2019 1:17:06 PM

The comments to this entry are closed.