« Reply to Sigler's "Humility, Not Doubt" | Main | Wild, Wild Duplass Brothers »

Wednesday, May 09, 2018

On Student (and Faculty) Evaluations: Some Good Reading and One Modest Proposal

The Chronicle of Higher Education has published an interesting lineup of pieces on end-of-semester student evaluations, a perennial subject of interest for academics. The "con" side is represented both well and more thoughtfully than usual by Michelle Falkoff, a clinical associate professor of law at Northwestern. The title of her piece (which she may or may not have chosen or approved)--Why We Must Stop Relying on Student Ratings of Teaching--is subtly indicative of that thoughtfulness. It is not a plead for abandoning them, but against relying on them solely or heavily. The main argument often brought out against them is made here, and in another piece: student evaluations tend to treat women and people of color differently and worse. Beyond that, however, they may also display "biases that fall outside traditional categories of discrimination," including "student negativity toward classes they perceive as overly challenging or taxing," that "harm an institution’s ability to use student evaluations to gauge instructors’ effectiveness." These trends have been added to by other negative features as universities move toward online evaluations, which have reduced the number of students filling out the forms and tend to adopt the snark of other online writing.

The "pro" side is also represented in the issue, refreshingly, in a "Defense (Sort Of)" of student evaluations by Kevin Gannon. Gannon writes that student evaluations are "a flawed instrument" at best and a "cudgel used against faculty members" at worst. But he argues that whatever students don't know about what they're evaluating, they are still "experts on what they experienced and learned in a course, and they ought to have a voice." And he too cites studies, which suggest that despite their flaws, student evaluations are still some of the best measures of faculty effectiveness.

My desire here is not to take a side between some reductive version of "pro" and "con," although some professors do have a fairly reductive negative view of student evaluations. One reason for that reluctance is my fairly blindered perspective. Like all professors, I have received nasty and unhelpful evaluations. (I have also, to my shame, had bad semesters in which the evaluations reflected the fact that I did not teach as well as I should have. I try to take those moments to heart, weeding out merely hostile rants but looking for common complaints that suggest areas of improvement and trying to implement them in the next class. What I ought to do every semester, but generally don't, is survey my students at least once early or in the middle of the semester, while there is still time for mid-stream improvements.) But students, so far as I can tell, don't judge me for what I wear (and I often dress unconventionally for class), don't apply irrelevant criteria for evaluation, and don't impose other unreal or uneven expectations or stereotypes on me. If I received such evaluations as a matter of course and knew that the data suggested they were likely to be more hostile because of irrelevant factors, I would not be keen on them either. Since I don't face such barrages, I am inclined to accord greater weight to the complaints of those who do. (I am not a fan, however, of those popular videos of professors reading hostile student evaluations, half in pointed humor and half in anger, just as I'm not a fan of the endless stream of "It's in the syllabus" complaints professors favor on Facebook and elsewhere. Students should be treated with respect, given that they are both a main part of our callings and the source of our livings. Everyone vents and jokes about their jobs, but more dismissive professorial treatments of students are all too common in the private and only semi-private spaces of social media.)

As Gannon argues, though, students still deserve a voice in their educations. If they are not simply "consumers," neither are they inconveniences or adversaries. And there is certainly such a thing as more or less effective instructors. What I admire about both his and Falkoff's pieces is their refusal to throw the baby out with the bathwater by, say, suggesting that we get rid of student evaluations while remaining vague and cursory about proposed "alternate methods of evaluating teaching effectiveness." Falkoff, in particular, rather than simply launching arrows at student evaluations, builds on her extensive experience to offer a host of reforms we might consider. Falkoff believes that "holding instructors to high standards is important, and student feedback is relevant." But she believes that we should treat them as only one piece in a "more holistic strategy in which multiple factors contribute to a more accurate, consistent, and well-rounded assessment." Similarly, Gannon argues that the "best faculty-evaluation systems are multilayered and employ a number of different measures," including "faculty narratives, peer observations, reflective dialogue, and sample teaching materials." 

Neither writer talks much about how we could improve student evaluations themselves. Doubtless there's a literature out there on that subject, and doubtless there are costs and benefits of moving to a better set of questions, including a drop in response rates (although clearly the approach of using online, "press a number between 1 and 5"-type evaluations has not resulted in a great response rate either). We could certainly aim to write better and more specific questions, and encourage detailed and specific responses rather than either numbers alone or general invitations for comments that allow students to rant at will. And rather than simply hand a set of evaluations (or a website address) and a brief and mechanical set of instructions to a student to read, we could do in the evaluation-distribution process to explain their purpose and prompt students to offer more serious responses. (Maybe the job of distributing student evaluations or links to those evaluations, and explaining them, should be given to higher-level staff.)   

On the "holistic" side, I do have one proposal to make. Many complaints about student evaluations note that students may not know as much about teaching and about the goals of a particular class as do seasoned instructors themselves (although, in law, few professors learn by anything other than experience and a marginal amount of mentoring by senior professors who may lack little serious pedagogical knowledge themselves; we are not necessarily much more expert about teaching than our students are). I agree that faculty evaluations of faculty teaching should be a major part of the evaluation process. I would suggest the following:

1) Every tenured faculty member should be obliged to visit an equal, and substantial, number of their colleagues' classes each and every semester--say, ten classes per semester--and offer feedback to those instructors and to the administration about the classes they visit. The list of whom to visit and the dates of those visits should be randomly assigned. Every faculty member, including tenured faculty, should receive at least two or three visits by different faculty members every semester.  

2) Those evaluations should involve more than a cursory visit to the class, and sometimes an incomplete visit at that. Professors should be obliged to read the material for that lesson and the syllabus for the course, and stay for the entire class.

3) Evaluations should be always be written and always be detailed. They should follow a set of rubrics designed in advance, including areas of effectiveness, areas of weakness, concrete suggestions for what should be improved or changed and what should be retained and enhanced, and so on. 

4) As I noted, those evaluation visits should emphatically include visits to tenured as well as untenured professors. Length of tenure is no guarantee of good teaching, it is easy to become complacent, and everyone's teaching can be improved.  

5) The law school administration, either directly or through a faculty committee or both, should be obliged to read, collate, and evaluate all those evaluations--not primarily for purposes of evaluating individual teachers, but for purposes of evaluating how well the faculty as a whole teach, what common flaws (if any) they display, and what the best practices are on the faculty. They should be required to write an annual report for all faculty members setting out this evaluation and set of recommendations about what to do and not to do. They should follow this up with a mandatory, dean-and-faculty-led meeting for all faculty to discuss that report, and especially best and worst practices. 

6) Professors who fail without good cause to visit the requisite number of classes and take their evaluation duties seriously, say by failing to write a report or not making it a serious and detailed report, should face penalties, from public shaming to the withholding of one's paycheck or summer research grant until one has completed one's requirements.

What I like about this proposal is that it is burdensome and widely distributed. Tenured faculty members have a duty to their law school, to their colleagues, to their students, and especially to students and to junior colleagues. It should be taken seriously, not just paid lip service. A few professors who are more willing to engage in service than others, and who thus face a disproportionate burden of service as a result, should not be made to do all the work for their colleagues. This is a collective and indefeasible duty. There are good reasons to worry about student evaluations, especially poorly designed and hastily administered ones. But there is an obligation to provide serious alternatives, to make them good ones, and to treat them as a responsibility of the entire faculty, individually and collectively. Teaching is a or the central part of our job, and we should be obliged to take it seriously, both at an individual level and at a collective and institutional one. And, despite the serious reasons to dislike student evaluations, tenured professors who merely take pot shots at them from the side should be obliged (along with everyone else) to be heavily involved in making sure our students receive the best possible instruction.

I think faculty members who take seriously either their teaching responsibilities or their faculty governance responsibilities, or both, will welcome such a proposal. I should think that faculty members who worry most (and most understandably) about bias in the evaluation process should welcome a system that is more serious and systematic in providing a better means of evaluation alongside (and not simply replacing) student evaluations--which, to be sure, ought to be improved as well and should not be given undue weight, at least without culling them, looking for genuine patterns and problems, and so on. (Student evaluations should also ideally be offered more than once and not simply on the penultimate day of class.) They too should be happy to be a part of the solution, even if it is burdensome, as long as it is universally distributed.

I suspect that it is just possible that a few professors will stamp their feet and complain about having to do a great deal of extra work. (No doubt one or two will find a way to work the phrase "academic freedom" into their diatribes.) But I don't think that, say, 30-60 hours per semester spent on mandatory duties aimed at improving the teaching quality of our institutions is an unreasonable demand on us, given the importance of teaching in general, especially in a professional school, and the fact that teaching and service are both major components of our duties as professors. And it is frankly a good thing to smoke out those professors who enjoy complaining but are less than eager to do something about the things they complain about.

I should add that various universities and law schools may already do some of these things. Some schools, for instance, have post-tenure review, and others may simply take our teaching responsibilities more seriously. I'm happy to hear in the comments about more concrete examples of what schools are already doing. And I'm happy to hear about alternatives, both for improving (rather than eliminating altogether) student evaluations and for improving faculty evaluation of teaching--although I think it is valuable and important for the latter to involve serious and universal duties on the part of the tenured faculty.          


Posted by Paul Horwitz on May 9, 2018 at 07:45 AM in Paul Horwitz | Permalink


I like Paul's suggestions. I would welcome feedback from a senior colleague. I've invited the dean to sit in any time he wants. But I never read student reviews. The biggest problem with student reviews is that they don't know what they don't know. "Gee, that professor really summed up nicely, explained all the day's cases." I would say that type of comment shows a complete failure by the professor, but once you give student comments some weight, you have to take them seriously. Which I don't.

Posted by: Douglas Levene | May 12, 2018 2:43:26 AM

For the big classes, one potential answer is external testing. If there are three torts sections and they all take the same exam, you can then run some statistics. It's harder to do with k-12 because the classes are smaller and schools sometimes play games with assignment.

If, that is, any school had a serious interest in finding out about teaching quality.

Posted by: john | May 10, 2018 11:07:09 PM

Apologies for the typos. The first margarita in the morning is always a mistake.

Posted by: Paul Horwitz | May 10, 2018 11:22:03 AM

One advantage of a serious faculty report on a colleague's teaching would be that it is better than raw and random comments from students (again. I am emphatically not against student evals as such and think it important the they get a voice). As noted above, if those evaluations are used as makeweight pluses or minuses in lateral and other hires, then random nasties or structural bias make them especially unsuitable for that use. Faculty discussions about laterals involving student evals, in my (limited) experience, are generally more cursory--or strategic--than illuminating, and it's perhaps more than a little bizarre that the academy on the whole expresses great skepticism about student evals and yet demands them from lateral candidates and uses them in hiring meetings. (I'm not against including raw scores and averages, BTW. But, again, they would be better if accompanied by a meaningful peer review.)

Posted by: Paul Horwitz | May 10, 2018 11:21:34 AM

Juniorprof, Anon, that's a good point: I do think teaching evaluations can be an issue in terms of lateral moves. How much depends on the school, I think, and I think juniorprof is right that they may matter more to bolster a view held on other grounds. But good point that they can matter.

Interesting, too, anon, that they can be a factor in promotion and merit raises. That reminds me: It would be interesting for someone to write an article (say, for the Journal of Legal Education) on how law schools determine faculty merit raises. At my old school, it was just something the Dean decided: You received a letter saying how much of a raise the Dean decided to give you, and it was entirely at the Dean's discretion based on whatever considerations the Dean decided to make. But I understand that at other schools there is more of a formula that cabins that discretion.

Posted by: Orin Kerr | May 9, 2018 11:54:04 PM

I agree with juniorprof. Most candidates on the market have some teaching experience now, and bad evaluations are a huge negative. This is probably more the case with schools down the food chain. Charitably, you might say they care more about teaching than scholarship. Less charitably, you might say they have bar pass issues.

They are also a small factor in promotion and merit raises at my lower-ranked school.

Posted by: anon | May 9, 2018 9:31:02 PM

To Orin's question, my experience is that schools consider teaching evaluations when considering lateral candidates (and, increasingly, entry-level candidates). I am not sure how much anyone actually cares about the scores, but poor evaluations can function at least as a sort of public reason that an opponent of a candidate can cite.

Posted by: juniorprof | May 9, 2018 4:31:30 PM

Interesting post.

One question I have is what use different schools make of student evaluations. Most of my experience is at a single school, GW, they weren't used for much in a formal sense. True, professors cared about their own evaluations as a sort of informal "grade" from the students. And at GW, the evaluations are posted on the law school portal for anyone to read, which generated some peer pressure: Anyone could just go through your colleagues' evaluations at any time and see what their past students thought of them. But the law school didn't use or consider the evaluations, as far as I can tell, for anything beyond slight consideration during the tenure process. And the use during the tenure process was particularly modest given that the tenure rates at my old institution were very high and there were few junior professors. So the evaluations were there, and (I assume) students used them to some degree to pick classes, but as far as I can tell the law school itself generally didn't rely on them.

I don't know if my experience is at all representative, but it does raise the question, at least to me, about why this matters. Do other schools rely on evaluations in some important way? Is this more about the idea that once you introduce some kind of metric, there will be concerns about whether the metric measures accurately even if the metric isn't actually used for anything formally? Or is the concern that some professors won't get a fair shake from students because they are reading past evaluations and assuming their accuracy? I'm very open to improving how evaluations are made, and I'm concerned about the serious bias that seems to infect them, at least as they are usually handled today. But I'm also less sure about what the uses of the evaluations are and how much they matter in terms of considering future reforms.

Posted by: Orin Kerr | May 9, 2018 3:13:44 PM

On Professor Dodson's proposal, I think you would need reporting on students' civil procedure scoring on the bar, contract scoring, torts, etc., for that to work. I think civil procedure, for example, makes up 14% of the multiple choice on the bar, and as little as 7% of the total score depending on how much it shows up on essay questions. Civil procedure professors might have a significant influence on how their students perform on that 7 to 14% of the test, but it would be hard to pick the influence up if you're looking at pass rates or total scores.

That said, I don't believe that my truly excellent 1L civil procedure, contracts and torts professors had any influence on how I did on those sections of the bar years after taking their classes; I had forgotten most of the detail of what they taught me by then and had to reacquaint myself with the subject matter by reading bar review materials. So I can't believe that law-school teaching makes much of a difference. Maybe if my professors had screamed mnemonics at me in class or projected them on PowerPoints I might have remembered more of what they taught me -- I still do remember certain mnemonics from seventh grade math -- but I personally think the benefits of that kind of teaching are outweighed by the costs. For one thing, it tends to feel extremely condescending.

Posted by: Asher Steinberg | May 9, 2018 1:25:17 PM

1. Free dental checkup for you next time you're anywhere in Power or Bannock counties! It's lovely up here, you know.

2. Maybe. But if he wasn't joking -- and, again, I'm assuming he was -- then it was gloriously obtuse.

3. Me too!

Posted by: Marcus Neff | May 9, 2018 12:48:58 PM

Regarding anonymity: No, I meant "skeptic," who was the one speaking rudely toward me. (Incidentally, although I note that many might assume "Marcus Neff" is a pseudonym drawn from caselaw, I will add that if you are THE Marcus Neff, DDS, of Pocatello, I congratulate you on a fine dental practice and on your willingness to put your money where your mouth is, so to speak, by speaking under your own name online. Civic courage is wanting in many places, sir, but not in Pocatello.)

If you actually thought Scott was joking and wanted to congratulate him for it, more power to you. I imagine you could have said so more clearly if you wanted to. The "gloriously obtuse" part somehow seems to cast some shadow of doubt on that.

I think that's the end of my end of this exchange. I'll just err on the side of ruthlessly enforced good manners henceforth. I do hope professors whose schools actually use various evaluation methods will run across this post and share some of them.

Posted by: Paul Horwitz | May 9, 2018 12:43:17 PM

1. What do you mean "pointlessly anonymous"? My name is my name.

2. I thought I was being very civil indeed by giving Scott Dodson credit for taking a very subtle, very incisive jab about the bar exam and classroom education. If that's not what he was doing, then I guess I was being too charitable. Apologies to all!

3. Big bucks to write here? Paid by the word?

Posted by: Marcus Neff | May 9, 2018 12:31:28 PM

My general policy is that commenters are free to speak rudely toward me. Not that I like it, especially when it's pointlessly anonymous, or that there's usually much use in it; still, I get paid the big bucks to write here. But I prefer it when my commenters (even if they are kind to me!--which may or may not have been your intent with the first part of your comment, but I certainly prefer that interpretation) are civil to each other. It makes it easier for me to monitor the comments and leave them up, helps the conversation (if any) to move on an even keel rather than in a downward spiral, and so on. There's nothing at all uncivil about saying you think Scott is wrong to suggest linking bar passage to pedagogical effectiveness--even as one of multiple simultaneous evaluation formats--and if that's what you think I encourage you to just say it straightforwardly. Thank you!

Posted by: Paul Horwitz | May 9, 2018 12:25:54 PM

Question for skeptic: What's the "insane waste of time" here? Paul's idea? Paul's post? Both?

More general comment: Scott Dodson's suggestion of linking bar passage to pedagogical effectiveness is either (A) the slyest bit of comedy on this site in ages or (B) gloriously obtuse. I'll hope it's (A) and commend him on the joke!

Posted by: Marcus Neff | May 9, 2018 12:18:25 PM

Great post.

Peer assessment of teaching is a critical piece of evaluating and improving pedagogical efficacy. One failing of the proposal, though, is that it is nonblind and could encourage a scratch-your-back mentality that erodes the integrity of the entire enterprise. An alternative, which has its own failings to be sure, is to have reviews written anonymously by faculty members who watch random videotaped classes. Yes, something may be lost in translation through video as opposed to being in the classroom in person. But the preservation of anonymity is, I think, important. (Nothing prevents the creation of an assessment system that uses both in-person nonblind reviews and blind video reviews, of course.) A recording also gives the teacher who receives a negative peer review (something anonymity might unjustly encourage) the ability to challenge any factual inaccuracies or misrepresentations by the reviewer. It also gives all teachers the ability to re-watch their performances in light of the reviews to enhance the quality of the feedback.

In addition to peer reviews and student evaluations, both of which seem to have value, let me add a third vehicle for teaching assessment: student performance. Schools often can track student performance across courses, across time, and across careers. For those states that release to schools the names of their students that have passed the bar exam, for example, those schools can link those students to their professors and gain insights about which professors are more likely to improve bar passage. In addition, first-year professors can be compared against each other to determine which are more likely to have taught skills necessary to succeed in subsequent law-school classes. I've heard that some schools do this effectively. Data crunching also has its failings, but if the data are there, I would think they contain useful evaluative information.

A final source of teacher assessment might be alumni surveys. Students often don't know enough about what they've learned until after getting into their fields of practice. I'm less confident this would lead to useful information, but it does occur to me as a possibility.

Again, these all have their failings, but the main thrust of Paul's post, with which I agree, is that we ought to do a better job of thinking about how best to measure ourselves, to improve ourselves, and to better teach the students we serve.

Posted by: Scott Dodson | May 9, 2018 10:58:01 AM

This is an insane waste of time.

Posted by: skeptic | May 9, 2018 10:49:27 AM

My answer is "I think so," or more accurately "Definitely yes but I am happy to think about how we should go about this." I have certainly sat in on many clinical classes although I am not a clinician. On the one hand, I acknowledge that there is an expertise gap there. On the other hand, non-clinical or "doctrinal" faculty (or whatever the accepted term is these days), who are ultimately and formally responsible for the governance of the entire law school and who should be interested in what students spend a vast amount of their time doing in any case, ought to be interested in and aware of what is going on in their school's clinics, and vice versa. And given that I tell my students regularly that legal research and writing is (or should be, ideally) the most important class they take in law school, and believe it, I and we should be equally interested in how those programs are doing, although I'm happy to think about how the evaluation process should work. It may be as simple as requiring that a certain number of the faculty evaluations come from within those programs and a certain number from the non-clinical or non-LRW faculty.

Posted by: Paul Horwitz | May 9, 2018 8:17:25 AM

Do you include skills, experientisl, and writing faculty (who are not tenure-earning at many schools) as both evaluators and evaluated?

Posted by: Howard Wasserman | May 9, 2018 8:10:46 AM

Post a comment