« Bell v. Hood lives | Main | And Some, I Assume, Are Good People . . . »

Tuesday, December 08, 2015

Essays and Objectivity

It’s been a little bit hard to keep up with my posting obligations over the last week because we are in the midst of exams, which means test drafting and lots and lots (and lots) of harried, last minute student visits. All of which has led me to this perhaps fluffy post with a question about exam essay grading.

I began my teaching career in the high school context, where I taught, among other things, American Government to seniors.   In that capacity, I graded a lot more student research paper projects than I generally do now. In the early going, I didn’t have much in the way of grading methodology—I would read the papers, make some comments in the margins, wrap up my thoughts in a final paragraph, and assign a number grade (usually between 65 and 100). I didn’t have to worry about curves or anything like that, and all was well. Then one day a funny thing happened.

This was before the age of email submissions, so students turned in hard copies of their papers to me, which ended up in a large (usually about 60-70) pile on my desk. On this occasion, one student inexplicably turned in TWO copies of his paper, and, equally inexplicably, the duplicates ended up at very different places in my stack. You can probably see where this is going. I graded both papers—and I didn’t realize it until I went to enter the grades in my book.   And yes, I gave the same paper substantially different grades. Not so good.  

I gave that student the higher of the two grades, but I was deeply troubled by the larger implications of what I had just discovered. I mean, I knew that grading papers is subjective, and that I probably varied a little bit depending on my mood, the quality of the papers that preceded a given paper, and other things like that.   But I never imagined that the difference could be as substantial as it evidently was. So I decided that I needed to develop some kind of coherent, or at least consistent, methodology.

My first effort at this was to begin creating grading rubrics, which is the method I carried into my law school teaching. I basically laid out in advance, in some specificity, the number of points a particular piece of substance would receive in a given essay answer. This went along pretty well for a couple years, until I recognized a disturbing trend. My rubric approach was fairly consistent, but it tended to undervalue persuasive writing, or the holistic quality of a student’s answer. Bullet points that hit all the right marks could score nearly as highly as a well-crafted and persuasive essay. So I tweaked the rubric a bit to add more points for an “overall impression” category. But, again I ran into a problematic case.

I had a student who had been great in class. She was thoughtful, well-spoken, creative and a leader in discussions. I knew she understood the material very well, and I am confident that she’s a great lawyer. On her (anonymous) exam, however, she simply forgot to cover an issue. She spotted it in her introduction, but then, inexplicably, forgot to write it up in the body of her answer. I really couldn’t justify awarding many points on that issue—even thought the rest of the exam was very well written and thoughtful. If I could, I might even have awarded more points on other issues, but that didn’t square with the objectivity the rubric was supposed to impose. She scored very well in the “overall impression” category, but the near total loss of one issue condemned her to a relatively low grade.

When she, predictably, came to see me to discuss the exam, all I could say was “you forgot this issue,” which she understood, but I felt compelled also to tell her that her exam was otherwise very good, and that I would be happy to write an explanatory recommendation letter. She was obviously disappointed, but had to accept the outcome. I again felt dissatisfied, though. I talked to colleagues who said that this was precisely the reason they took a flexible, holistic, overall impression approach and didn’t bind themselves to rubrics.   But I felt I still needed the rubric to insulate me against my subjective human frailties.

At this point, I’ve settled on a kind of hybrid—but very time intensive—approach. I do two grades. I read once with the rubric and assign a score.   I then print out an entirely new set of exam answers, with no marks on them, and read them through again giving each just an overall impression score. I then average the two scores together and round up.  So far it’s working, but I keep feeling there has to be a better—or at least easier (and not much worse)—way. Last semester I had just about 100 Con I students, and it was a long process.

Anyone have thoughts or ideas about the ideal method of essay grading?

Posted by Ian Bartrum on December 8, 2015 at 07:31 PM | Permalink


I will give you proposals on how to tackle any issues at school right now. I can also tell you how you can operate without much trouble or problem to overcome problems in the same university ventures. It's worth contacting https://writemypaper4me.org/blog/classification-essay , which guys would help to write any complicated project quickly. I already know what to do here. I know myself, because I commissioned them to carry out a complex physics project, and they coped with it very easily and did very well. The teacher was surprised and shocked.

Posted by: Mitra Surik | Dec 22, 2020 10:22:03 AM

So here's a somewhat tortured analogy... (I'm currently procrastinating on grading exams, telling myself it's only fair to my students to wait until after I've finished my coffee.)

I play X-Wing competitively. In the game, you win (generally) by destroying all of your opponent's ships. But, in addition to a win/loss record, you also have margin of victory, which is based on the difference in number of points destroyed by both players. Points used to be all or nothing, either you destroyed a ship and got points, or you didn't destroy it and got nothing. But, a little while before the world championship this year, the rules changed to award half points when getting a large ship (such as the Millennium Falcon -- large relative to starfighters) to half hit points.

What used to happen was squads using large ships would dominate tournaments because they would always have high margins of victory. Han Solo could take 11 damage, be left at 1 hit point, and the player would still get credit for not losing any points, while a player with a TIE Fighter swarm taking the same damage would likely lose 3 ships, and a third or more of their points. This caused a lot of people to stop playing large ships (which also at the time became vulnerable to a new turret upgrade that had just been released).

Han Solo dropped from being one of the most popular pilots (in the world championship squad the year before) to being in only 4% of the top 50 squads in this year's worlds championship. He's now probably the most underrated ship in the game, all because people are no longer able to use him to horde points. What they've forgotten is that while he routinely gives up half his points, Han Solo also wins games.

So on to the point... In X-Wing, if you want to win tournaments, you can hope that your one or two losses are close, and all your wins are big, and that you'll proceed based on a stellar margin of victory. Or, you can just win all your games. Winning is better.

As a student, if you want to get a good grade, you can go off topic, miss a major point or two, fail to follow some of the instructions, and hope to skate by based on your stellar writing style. Or, you can just get everything right. You can hope that a holistic approach to grading happens to swing your way, or you can just write an objectively good essay.

[For those preferring a sports metaphor: If you want to make the college football playoffs, you can either lose a game and argue the quality of your wins, strength of schedule, and strength of conference, or you can just win all of your games.]

When grading essays, it's very important to keep in mind the message we're sending to students about the values of not just our classes, but the academy and professional world at large. If a grading approach has the effect of telling students it's okay to miss some major issues because their stellar writing style and participation in class made up for it, I think we're doing them a pretty serious disservice.

Posted by: Derek Tokaz | Dec 10, 2015 9:21:26 AM

I read every exam twice and sometimes I find I have to make adjustments to earlier graded exams because I've changed how I look at a problem. It doesn't ensure perfect consistency but it reduces the variance substantially.

Posted by: Douglas Levene | Dec 10, 2015 3:03:20 AM

I wonder whether most attempts at grading are really attempts at creating distinctions where none exist, and this is the heart of the problem. We can all recognize really great [papers, exams, whev.], and we can all recognize really terrible X. But the vast muddle in between... maybe there's inconsistency in there not because we're inaccurately tracking some underlying distinction, but rather because there is no meaningful underlying distinction in the first place? (A rubric allows us to pretend there's a distinction, but only by the fundamentally arbitrary act of assigning weights to different items in it.)

Posted by: Paul Gowder | Dec 9, 2015 7:49:02 PM

Great post, and great responses. I use a rubric to ensure that my grading is consistent across exams, but noticed the same problem Ian mentioned re: issue-based rubrics undervaluing persuasive / high-quality writing. So now I include in my rubric a set number of points assigned to "overall quality," and I'm satisfied with how this plays out. Like Brad, I don't see the problem with the case Ian described of an otherwise high-quality exam answer receiving a lower grade due to a missed issue.

For what it's worth, I occasionally (in addition to using the rubric) jot down, while reading the exam, what grade I think it "should" get ... and when I go back and compare the curved grade based on the rubric points to my "predicted" grade, the results invariably align -- if not to the exact letter grade, then with a margin of error of a plus- or minus-.

Finally, as much as I strive for consistency and accuracy in grading -- for reasons of professional ethics, respect for students, and just plain wanting to do a good job -- I think, in the end, that stressing too much over exam grading methodology in anonymous curve-required classes is a mistake. For better or for worse, in my 6 years of teaching curved and anonymously-graded courses, I've never once felt that an exam ended up with a worse letter grade than it "deserved" as a result of applying the curve. In contrast, every year as a result of the curve, I give many students higher letter grades than I think their exams merit. If I offer a 100-point exam and the strongest student scores an 80, I wouldn't think to give that student's exam an A if it weren't a curved (assuming, of course, that my exam was reasonably written and I wasn't asking of the students more than they could reasonably handle).

Posted by: Nadia Sawicki | Dec 9, 2015 5:59:13 PM

Except that he didn't fail the relevant audit (which is running them once with a rubric and once with a subjective impression). "As Jeff pointed out, the correlation is generally very high, which I thought would be true, and which is comforting." And I disagree that subjective is the same thing as standard-less. It just means that compliance with the stated standards cannot be measured without an exercise of (informed and experienced) judgment. We do this all the time in law and in life.

I do both, rubrics and subjective, but I worry about over-reliance on the former. It is natural to prefer them, as a professor, because it is a lot easier to respond to an unhappy student if one can simply point to the absence of items A, C, and G. But I wonder if discomfort with confrontation is leading us to measure something that doesn't necessarily correlate with retention, absorption, analytical intuition, and likely good lawyering at the end of the day. Just a question, a musing. But it reminds me of people doing word-searches for legal research rather than reading and reasoning by analogy. I feel like it's too mechanistic and may miss things.

Posted by: anon | Dec 9, 2015 4:27:12 PM

Everyone thinks they have good judgment, but unless you do the sort of A/B testing that Ian did accidentally, you don't know whether or not it is true. My hunch is that most people would fail it in the same way he did. That's no failure either, standard-less subjective judgement that nonetheless has the desirable quality of consistency is really really hard.


I don't really see the problem with the student that got a poor grade for leaving out an issue. Obviously it was unfortunate but the grading process did what it was designed to do. It is no more unfortunate than if the student had the flu and produced an all around bad exam.

Posted by: Brad | Dec 9, 2015 3:59:20 PM

I guess I am not sure I understand why a large subjective component to grading is unfair, if the exams are blinded. Ordinarily the problem with subjectivity is the potential for introduction of bias, right? If anonymous grading addresses the issue, then your assertion is that there is something inherently unfair about subjective assessment. To be sure, subjective assessment means the judgment of one person dictates the grade. But we are teaching in this area precisely because (I hope) our judgment is solid. If I cannot be trusted to make a subjective assessment about the quality of a response, I shouldn't be teaching the course. There is more to good answers - and good lawyering - than could be captured by a machine applying a rubric. Assessment of practicing lawyers (by senior lawyers, clients, and external audiences including courts) has a significant subjective component.

Posted by: anon | Dec 9, 2015 2:47:37 PM

Thanks for the thoughts. As Jeff pointed out, the correlation is generally very high, which I thought would be true, and which is comforting. If I run into a case where there's a big difference, I can go back and double check.

The duplicate papers happened in high school, so they weren't anonymous, so I knew there were two copies...

Posted by: Ian Bartrum | Dec 9, 2015 12:40:41 PM

I also taught at the high school level for many years, including AP World History, and I assigned a lot of writing - I graded about 900 essays per year.

Using a rubric is an absolute must - your colleagues are simply wrong. A rubric is necessary not just for consistency in grading, but also to tell students what you are looking for. It is unfair to them to have a large subjective component, or to grade "holistically."

As for persuasiveness, etc, you need to come up with a way to operationalize that on the rubric --eg, you can include entries for the degree to which evidence supports the argument that your students make, etc. In other words, think about what makes a paper persuasive, and add those to the rubric.

Posted by: gdanning | Dec 9, 2015 12:29:28 PM

Have you ever done a correlation of your holistic grades with the rubric grades? I would bet it's pretty high.

Not sure what classes you teach, but the expectation of a well-crafted and persuasive essay, somehow uncorrelated to the identification of rubric issues, on an in-class, timed law school exam strikes me as unrealistic. (As should be clear, I'm a rubric guy.) On, for example, my contracts exams, there may be fifteen issues embedded in a question. One of the failure modes I see (and I've confirmed this with students) is that exam taker doesn't first consider and prioritize the issues, but instead leaps into writing, with the result that you may get a well-crafted and persuasive essay on the first obvious issue, which in the rubric only counted for ten percent of the points. It bothers me when that happens, and I certainly give the student the max for that issue, but I see the exams primarily as a test of translating narratives into legal logic, just as a lawyer does when a client describes a problem, and missing issues is a big deal!

Also, I don't think it occurs very often that the dilemma you are describing is the difference between a high grade and a low grade. It may be important at the margins between particular letter grades (if that is your system). I can't, however, remember an instance where a student wrote an insightful exam so far removed from identification of the issues in my rubric that the grade would have been something like a C, but for my conclusion that it was a A level contribution to heretofore unconsidered issues in the law of contracts!

Posted by: Jeff Lipshaw | Dec 9, 2015 9:49:32 AM

Tangential but curious: how did you decide that the two copies were not an original and a lazily plagiarized version?

Posted by: WG | Dec 8, 2015 11:35:35 PM

The comments to this entry are closed.