Wednesday, January 16, 2013
Do We Grade Typing Speed?
Grading exams is the hardest part of being a law professor. Evaluating essay exams with any precision is a challenge. And I frequently revisit what I’ve done to ensure that I’m grading in the most accurate way.
Exams are often scored as a series of points. The more a student says about the essay prompt, the more points a student earns. The more points a student earns, the higher grade she earns.
Well, that’s not quite it. The first stage, “The more a student says about the essay prompt,” comes with a caveat: the student must say something relevant, something in response to the call of the question. But that’s hasn’t stopped law students from passing along the tale that one of the most important things to law school success is typing as many words as possible.
It’s hard to read the advice given to law students, usually from one another, discussing exam-taking techniques to this effect. One is the “attack outline,” a pre-written series of answers (mostly black-letter law) that the student can vomit upon the screen when there’s any essay prompt in the general vicinity of said pre-written answer for an open-book exam. For instance, if the question is one about, say, “personal jurisdiction,” a four-paragraph regurgitation of everything about personal jurisdiction, relevant or not, will appear on the page.
So, is there any truth to word counts as a proxy for better grades? Mostly no, in my experience.
Yes, of course, the student who types more usually has more to say because she usually “gets” more. She usually spots more issues, she usually grasps nuances, she usually has a superior analysis. So, more words would mean a higher score.
But, that’s not always the case. Longer answers invite discussion of irrelevant material. Well-crafted outlines threatened to go unused if none of the essays ask about certain topics, and students find an outlet for discussion of them. Students find themselves addressing tangential material as a prophylactic measure.
I look at my answers each year to see if I can find a trend. These are essay answers from a first-year course, the X-axis point values, the Y-axis word count. (Also, I’m deeply grateful to my colleagues Rob Anderson, who blogs at WITNESSETH, and Babette Boliek for their data-driven support.)
The answer lengths ranged from around 1200 to 3100, with a median of 2311. Scores ranged from the high 30s to over 140, with a median of 87. The red dot in the center represents both medians. (Points were later added to other graded components and converted to a grading scale.)
I ran a regression analysis, and, as you can see, the R² is only 0.31, which is fairly low, but not insignificant.
But let me slice the data one more way. The relationship is largely driven by outliers on the negative side. If I take out the five lowest scores (it’s unlikely those students performed poorly because of typing; it’s probably that they simply didn’t have as much analysis), there’s not much of a relationship at all, as the R² drops to 0.14. (For those not statistically inclined, that's pretty low.)
I note a few items. First, it was almost impossible to exceed the median score using fewer than 2000 words. That suggests some minimum threshold of analysis necessary.
Second, high scores didn’t necessarily come with wordier answers. There were answers in the range of 3000 words below the median, and answers in the range of 2200 words among the very highest scores in the class.
Third, it doesn’t necessarily mean typing speed (as opposed to word count) is unimportant. Fast typists may well type the same number of words as their peers, but have more time to think and analyze.
This, I think, is pretty consistent with the “mostly no.” Longer answers tended to have better analysis; but, it isn’t highly correlated with higher scores.
So, how about your experience?
TrackBack URL for this entry:
Listed below are links to weblogs that reference Do We Grade Typing Speed?:
All things considered, on a scale of 1 to 10, how much confidence do you have that the grades you issue accurately reflect a student's actual mastery of the subject matter?
Posted by: Mike | Jan 16, 2013 9:27:02 AM
This is why all my exams have word limits.
Posted by: James Grimmelmann | Jan 16, 2013 9:39:24 AM
I use word limits and take home exams with more time than words needed. When I ran the numbers (a few years ago), on a 4000 word limit exam, I found a linear correlation up to 3000 words, and then random scattering after that - no pattern at all.
Posted by: Michael Risch | Jan 16, 2013 9:58:19 AM
Mike, a good (and tough!) question. There are layers to the answer.
First, the grades are basically exclusively an evaluation from a final exam. To be honest, I'd prefer to reconsider how law schools do this model and include some kind of interim assessment, because I'm not wholly confident that a single final exam can most accurately reflect mastery. As a corollary to that, the single final exam may yield idiosyncratic results: a computer malfunction, an overlooked question, or a recent illness are a few things that can make the final exam less than accurate.
Second, the exam itself is, I think, all other factors being equal, a pretty good way of identifying mastery--at least, for me and, I think, for most of my colleagues. I have a plan when writing the exam about what kinds of issues needs to be addressed. I offer a broad variety of points (both for depth of treatment and for breadth of issues spotted). I use a rubric with a number of points, and walk through it carefully. I grade each answer separately, so that I'm not biased by the previous answers the student has given; and I start through the stack of exams at different points so I'm not biased by when I score the question. I run regression analyses between questions to see whether there's a high level of correlation between different questions answered. And, some of my colleagues even run regression analyses between their grades and the students' overall GPAs.
But, third, grades are an imperfect measure. Not simply because they are a letter assigned to a student, but in part because of the nature of a curved distribution. So, suppose I look at the distribution, and I decide to assigned anyone with 112 points to 89 points an A-, and anyone from 88 points to 78 points a B+. The difference between the 89 and the 88 is not great, at all. But it looks as stark as the difference between the 112 and the 78. It is, obviously, imperfect. But, that's why one has grades in several subjects.
Now, maybe that doesn't answer the more salient question. It may say, I have a high degree of confidence that my grading is sound, and that my grading is as good as it can be in a curved distribution. But, does it reflect mastery? For the most part, I think so. I think that's why one can find high correlation between first-year grades and bar passage rates; and why firms and judges care deeply about grades. If grading were not a good indicator of legal understanding, one would expect first-year grades to be less important to legal employers. In reality, they are highly valued, which suggests some bridge between the classroom and the practice.
Posted by: Derek Muller | Jan 16, 2013 11:02:52 AM
If you dismiss the extreme left side of the distribution because you assume students do not have enough analysis, then you should also dismiss the extreme right side of the distribution under the assumption students are hornbooking, showing off, or otherwise providing irrelevant material. If you were to do that, an eyeball analysis suggests a stronger R2.
Posted by: Phil | Jan 16, 2013 11:33:01 AM
Psychometrics in testing can be complicated and there is often quite a bit of error. R-sq's of the magnitude you report do not make me too comfortable. The LSAT has a correlation with first year grades of about .36, which is an R-sq of .13. (See e.g., http://www.lsac.org/lsacresources/research/tr/pdf/tr-11-02.pdf). That suggests that essay length is as good (or better) predictor of first year grades as high-stakes testing used in law school admissions.
Posted by: Erik Girvan | Jan 16, 2013 11:52:32 AM
I strongly agree with James Grimmelmann:
They're essential. They do not eliminate this problem, but they ameliorate it a lot, and they add a pedagogically useful element to the exam: prioritizing your points and not wasting huge amounts of space on less important, minor stuff.
Plus, as a little bonus, word limits make grading exams that much less onerous, since there's a bit less to read.
Posted by: Joey Fishkin | Jan 16, 2013 5:04:19 PM
just out of curiosity, how do you enforce your word limits? do you stop reading at the set limit? use a format that restricts how much the students can write?
Posted by: rebecca bratspies | Jan 16, 2013 5:55:42 PM
Like some others, I used word-limits which I intentionally set significantly below the maximum amount that people could reasonably say on a topic. (I usually use a few short essays rather than one big one, so have fairly short word-limits for each.) This encourages people to think about what it most important, but doesn't encourage (and even punishes) putting in every single thing one can think of. I'll admit to being surprised at one exam that managed to do quite well with many fewer words than were allowed, but most people who got top grades wrote quite close to the limit, as I'd intended. But, writing up to the limit was no guarantee at all of doing well.
Posted by: Matt | Jan 16, 2013 7:20:33 PM
"and why firms and judges care deeply about grades. If grading were not a good indicator of legal understanding, one would expect first-year grades to be less important to legal employers. In reality, they are highly valued, which suggests some bridge between the classroom and the practice."
Maybe it's just that, besides the student's law school (and perhaps college and major), grades are the only information they have. I could see this plus inertia (people who received good grades believe that this signifies merit and seek to hire others with good grades once they're in charge) as being the reasons grades are valued so highly.
I don't doubt that grades are a valuable data point, though I suspect they are overvalued, but really what's the feasible alternative from an employer's point of view?
Posted by: Doug | Jan 16, 2013 10:41:00 PM
Why throw out the bottom 5 scores instead of the bottom 7? Students 6 and 7 performed only very marginally better than student 5 but had much higher word counts, including one who appears to have one of the highest word counts in the class. I suspect the left side of the adjusted line is being lifted pretty substantially by students 6 and 7. I think its fair to dismiss left-side outliers as not probative of anything you're trying to analyze, but I think you need to dismiss all of them and not just the ones with low word counts.
Posted by: Christine | Jan 17, 2013 11:15:50 AM
My students write out their exams in longhand on paper. That inherently limits how much they can write. Of course, it means I have to decipher some bad handwriting but that's OK, I don't have to deal with pre-written, irrelevant outlines. I haven't run any regressions, but my impression is that longer exams tend to be scored higher because the students have identified more relevant issues to analyze, but that's not always the case. Some exams are long and almost all irrelevant and some are very short and every sentence makes a good point.
It's impossible to grade exams perfectly no matter how careful you are. And different professors would grade the same exam in different ways. That's why the GPA is a much better measure of student performance than any single grade.
Posted by: Doug2 | Jan 17, 2013 7:11:45 PM
Two quick thoughts. When I have used word limits, which I do quite often, I stop reading at the word limit and tell them I will penalize if they repeatedly violate the limits. I draw a line on the test where I stopped reading. But I also do not find that the longer answers are necessarily better, I think they are to the extent the Professor is using a check list or rubric that gives points for saying various things. However, those rubrics do not take into account things that should not have been said -- I grade more free form and routinely penalize people for including weak or irrelevant arguments, which is often the case with the longest answers.
Posted by: MS | Jan 18, 2013 8:36:34 AM
Nothing can fix the ridiculousness of final exams--no matter how well they're formulated or graded, they merely assess and are not formative, are needlessly stressful, useless to demonstrate practical knowledge or ability to apply that knowledge in real world situations, and, for me, incredibly annoying to grade. That's why my final exams are substantially shorter than normal, and I've added a series of mid-semester writing assignments. These assignments are motions, discovery requests, and other real world exercises that still require the students to apply their classroom knowledge, including theoretical knowledge. And, yes, I do this in my larger classes (30-85 students). This grading methods helps students, reflects reality a bit more, offers better opportunities for students to earn the higher grades that their knowledge deserves, and makes it better for me, because I'm helping them learn, rather grading a reflection of students' performance on one day, on an exam that can't correlate virtually at all with reality.
Posted by: Steve-o | Jan 18, 2013 9:07:11 AM