Kathy Lund DeanBy
In this column, I want to examine the thorny and multi-dimensional ethical issues around student evaluations of teaching (SET). Using Quinn’s Competing Values framework to guide the conversation, I look at who uses SETs, who wants to use them, and ethical issues of context, competing concerns, and most saliently, validity problems. I also consider how we use SET data—whether formatively or summatively—and what process assurances we may owe our colleagues to improve their teaching practice. I finish with a brief conversation with Gustavus’ Associate Provost and Dean Darrin Good, who is heading up our SET modification effort, and as usual, some discussion questions.
With every end of semester comes a variety of closure experiences, and among those hallowed rituals is the administration of student evaluations of teaching (SET). In doing some research for another blog I write , it has become clear to me that very few aspects of academic life inspire the kind of emotion that SETs do. I lost an entire hour just reading the comments on this blog posting that suggested the best way to go is simply not read our SETs at all. Others say that using SET data for faculty evaluation, pay decisions and as a retention criterion is folly. There’s a lot of hostility out there!
My institution has recently been considering revisions to our SET instrument, a thankless task akin to running into a stiff headwind*. So the timing seemed right to take a look at SETs from an ethical perspective. Revising SETs is a tricky undertaking, because there’s more at stake now than there was when SETs were initially created. Using Quinn’s Competing Values framework helped me think about the original and now changing roles and uses for SETs. Briefly, the Competing Values model says that managerial work toggles among roles in a 2X2 matrix, with structure considerations on the vertical axis and focus considerations on the horizontal. The structural continuum ranges from flexibility to control, while the focus continuum ranges from internal to external. It’s on those continua that I want to base the conversation here.
Focus considerations: internal vs external
SETs’ creation is traced back to the 1920s, and is generally attributed to Herman Remmers’ work at Purdue University. Only a handful of universities collected SET data through the 1940s and 1950s, but student demands for more instructor information pushed SET usage into many institutions in the 1960s.
The original intent behind SETs was to help faculty develop their teaching practice, and to offer a feedback loop among those closest to the teaching and learning craft—students, instructors, and perhaps departmental chairs or an academic dean. It was a way to offer a window into an often opaque process between instructor and student, ideally as a continuous improvement tool (Galbraith, Merrill & Kline, 2012). Consistent with Quinn’s “internal process model” roles, SETs have also been integral parts of a professor’s performance evaluation, such as with promotion and tenure packets, or when being considered for instructional awards.
These internal roles have persisted, but SETs have taken on added weight in ways that I doubt were anticipated by their creators. The data that had cozily resided among a select few has now been pushed outside academic systems and sometimes outside academic control. Now, many people are interested in SET data, and very recent research shows the migration of SETs data from internal stakeholders to those outside the institution. The goal of this migration has been to “prove” that there is some consistent mechanism by which those outside academic walls may be assured that faculty are providing quality instruction and overall value (see Spooren, Brockx, and Mortelmans’ Herculean review, 2013). The push for evaluation information has led to new types of SETs housed outside of university walls, such as Rate My Professors (RMP) and community impact evaluations. It’s a brave new world of transparency, where anyone with a computer can offer comments on a professor’s performance. Administrators and faculty have little or no control over these sites, which presents inherently ethical problems. More on this later, since the Kansas Legislature situation seems to me to have direct implications here.
Externally, SET information has become one of a laundry list of statistics that some would use to cull out poorly performing professors, cut funding, or conversely to market specific professor skills. And it’s these added “functions” of SETs that have my colleagues both here at Gustavus as well as at other institutions very cautious about what to change, and how to change it. The ethics of SETs has become complex and murky. I think the biggest three issues include lack of context, competing concerns, and probably most importantly, validity issues.
Lack of context
Some of the people who would like access to SET information lack the context by which to interpret the data with integrity. A parent, for example, might see low SET ratings for a particular professor and conclude poor instruction, not realizing that the course may be a rigorous Gen Ed pre-requisite with traditionally low grades, a relationship that has been shown to exist in a variety of disciplines (see Clayson, Frost & Sheffet’s 2006 AMLE article for a particularly clever experimental design). Or, a student may rate an instructor poorly because he/she has little interest in the subject matter, or because the instructor has a teaching style the student doesn’t like. A state board of education member seeing those ratings may not take student characteristics into account. Interpretive context matters.
There are sometimes competing concerns among stakeholders. For example, SET ratings are not generally released to students in the same way that performance ratings are usually kept between employee and manager. Professors have a right to performance information being kept close at hand, for sure. However, students, fed up with the lack of information by which to choose effective in lieu of ineffective teachers and having to rely on sometimes sketchy word-of-mouth to avoid awful professors, have moved SETs outside academic walls to sites like RMP. Students tell me, too, that they post on RMP when it appears over time that nothing has changed with a professor’s teaching despite repeated SET input about what could be improved. Some of their word-of-mouth information about instructors is inaccurate, for sure. However, conversations I hear each semester among students remind me of two lessons: first, that such informally-shared information can indeed be spot-on, and second, how much people in any organization dislike what I call ‘the illusion of participation,’ or, being asked for their considered opinion when that resulting opinion has no impact whatsoever. So while confidentiality of SET ratings is appropriate for some reasons, their release could also be seen as the right thing to do for students.
Consider this analogy, posted by “Bill” in response to a blog post about releasing SET data publicly:
I honestly fail to see why profs should be allowed some sacred space in which to be arbitrary, unfair, illogical or irrelevant with both the impunity which they ALREADY LARGELY HAVE in most institutions and also ANONYMITY to their patrons, that is, the people who will pay $5000 + for their classes. This is no different from saying that Consumer Reports shouldn’t be able to run customer satisfaction surveys.
Clearly, our Bill has had a negative past with his professors in which he experienced their performance as unethical. And it annoys me that someone thinks my performance evaluation information is no different than the information I get when I research computer tablets on Consumer Reports. But, although we’ve rejected the “paying customer” metaphor in academe, does Bill have a point about wanting information about an experience for which he will be paying a significant amount of money? I think he does. One of the compelling issues for professor evaluation information is balancing our right to privacy with others’ (including Bill’s) right to know.
This is, in my view, the Big One. The increasingly broad usage of SET information has placed a spotlight on validity concerns. While SETs are generally reliable, here’s the empirical bottom line about validity: All of the literature I examined (op. cit.) as well as web-based writings tell the tale of an instrument that is widely used yet maddeningly resists efforts at validation. Measuring “teaching effectiveness” is conflated with “student learning” (Stehle citation), a messy and often convoluted relationship. And anyone can post on RMP whether they actually had a course with an instructor or not. Even though there is research indicating that RMP **can** be valid for a couple of important evaluation constructs, RMP’s “hotness” chili pepper rating gives me pause.
More broadly, the advent of assurance of learning (AOL) mandates, particularly by accrediting bodies, manifests external stakeholder frustration with a roundly perceived lack of institutional and professorial accountability and responsiveness. I understand that, and am in many ways sympathetic to their gripes. State governments, communities and parents are among those who have successfully pushed for more transparency, more action, and more tools with which to get institutions to pay attention to teaching quality. But, and this is a big but, using SETs in punitive and holistic ways, and as the only data points, by which to assess professors’ classroom performance, is quite simply wrong.
A special case….
The University of Kansas Board of Regents just voted to restrict professors’ right to express views via social media, saying they want to protect the institution. In an age where institutions are beholden to donors and the fear of controversy looms large, I get it. What they are really worried about are the responses from, well, anyone who doesn’t like a professor’s pedagogical methods or course goals and raises a public stink. I see this as a special form of an über-invalid SET from people without context, and with agendas that compete with our role as encouraging difficult, unpopular or power-contesting viewpoints. As Schuman asserts, “A bunch of know-nothing randos on the internet should not be able to get professors fired.” This is a patently unethical new horizon, since I am against “know-nothings” having so much potential say in our performance evaluation as a whole.
Evaluation & support focus: flexibility vs control
The other axis in Quinn’s model, for my purposes here, is how we want to use this information. I am a fan of developmental performance appraisals rather than simply evaluative ones (SET gurus use the terms “formative” and “summative” in lieu of developmental and evaluative). When I managed others in industry and was doing a performance appraisal, my focus was always about laying out expectations and supporting the employee in achieving them. The appraisal process was more important, to me as a manager, than the content of the appraisal itself because I knew I did not have complete insight into any employee’s performance. I think the same should ideally be true of SETs and how we use those data. Because there are such well-documented validity issues with SETs, and because there are so many nuances in a professor’s ‘effectiveness,’ the idea that they should inflexibly be considered as-is, and as THE evaluation data of an instructor’s classroom effectiveness is unethical. Seeking more quality control over academic instruction is a good thing, for everyone, but it’s that process of inclusion and ownership that must be integrated, and an ethical HR process is more flexible than controlling, in my experience.
The idea of “control” in academe deserves attention. I have never seen anyone linearly, directly and measurably make a student learn something. Our control over what students learn is minimal – if in fact what we really want to effect by “teaching effectiveness” is student learning. Thus, those who view SET data as a control systems tool overestimate the “teaching—learning” relationship. Part of the wonder of our jobs is watching that “a-ha!” moment that differs for each learner. I wish I could say I had more design over when that happens, and for what students, but alas, in 16 years I have not found that formula. Being evaluated as “effective” is irreducible to a single instrument at one point in time.
OK—now, having said all that, I will also say I am in favor of using SET data, however flawed, to help us get better at our craft. I only say that if SETs are used in the developmental or formative way discussed above. The key seems to be, as they say in recovery groups, taking what we need and leaving the rest behind, and using that information in ways that help us learn how to be better teachers.
To get an idea of what that might look like, I talked with Dr. Darrin Good, Associate Provost and Dean of Sciences & Education here at Gustavus. Darrin is heading up our effort to revise our SET.
The conversation we’re having now is not only about what questions should go on the SET—what we want to validly measure – but how that information will be used. Although Darrin told me first things first–we need to work on getting a better instrument that faculty trust–we’re asking interesting questions: Should SET data be used in ongoing performance evaluations, even post-tenure? Who should have access to SET data in addition to the professor him or herself? What kind of support will be available for those instructors with negative and ‘needs improvement’ ratings?
Given the far-reaching impacts of the answers, it is no wonder to me that, while the committee (and all my colleagues that I have spoken with about the SET) agrees that the current instrument is simply awful, committee members are having a hard time converging on something different.
When I asked Darrin how to respond to SET validity issues, he said, “A SET can hold up a red flag, not necessarily exactly the right thing to pick out that needs attention, but a good committee can see a data point and spend time on it if needed with the faculty member.” [By ‘committee’ he is talking about an informal coaching circle or triad of colleagues who are devoted to helping each other distill SET comments and classroom observations into helpful practical teaching improvement.] He added, “It’s important that we understand where we can and can’t get good information from students, and that leads to those red flags. There are instances where students may not be able to define or articulate what’s wrong with a professor’s teaching, but they know there’s something not working in a professor’s course, and that’s where a committee can see the need to dig deeper and work with the faculty member to see what’s going on.” I really liked his description of this supportive process, and it links back with respecting process over content for performance evaluation as a whole.
Here’s the last thing I will say for this post—which seems to sum up the gist of the thing nicely, also from Darrin. “We know a lot about good pedagogy. What we want to see is if that instructor is doing things that generally lead to good learning outcomes. Are there pedagogically sound behaviors and assignments? There are many ways to be successful in the classroom, just like there are many different leadership styles. We want to focus right now on the instrument itself and increasing validity and meaningfulness.”
I welcome your experiences and responses.
* This phrase comes from a response I received on the OBTSL-listserv after I sent out a query asking for examples of other peoples’ SETs, in response to our effort to revise the one here at Gustavus.
Discussion questions for student evaluations of teaching (SET):
- What would it mean for your institution to have SETs “done right?” What would the content look like, and what about the process of using them?
- Should SETs be voluntary or mandatory? Why do you think so?
- What could we do in terms of administering SETs that would increase their usefulness for professors? What should we be asking students to do to increase their helpfulness?
- In what ways have you seen faculty manipulate SET ratings? Given that SETs generally serve as the major or only data source for teaching evaluation, could it be considered simply smart practice to “teach to the evaluation” criteria similar to the way we may encourage students to complete assignments with the grading rubric in mind?
- Do we owe special consideration or protections for PhD students and their [usually] poor ratings as they learn the craft of teaching? As job candidates, they will probably be asked to share SET data, which will probably be at their lowest point in their professional career. Is there anything we ethically owe students in this respect?
- Does your institution use a different instrument for pre-tenured, post-tenured and senior faculty? Why or why not? Is it fair, in your opinion?
Here is the reading list from the blog (I don’t want to call it references, because, well, it’s a BLOG):
Clayson, D.E., Taggart, F.F., & Sheffet, M.J. (2006). Grades and the student evaluation of instruction: A test of the reciprocity effect. Academy of Management Learning & Education, 5(1), 52-65.
Galbraith, C.S., Merrill, G.B., & Kline, D. M. (2012). Are student evaluations of teaching effectiveness valid for measuring student learning outcomes in business related classes? A neural network and Bayesian analyses. Research in Higher Education, 53(3), 353-374.
Spooren, P., Brockx, B., & Mortelmans, D. (2013). On the validity of student evaluation of teaching: The state of the art. Review of Educational Research, 83(4), 598-642).
Stehle, S., Spinath, B., &Kadmon, M. (2012). Measuring teaching effectiveness: Correspondence between students’ evaluations of teaching and different measures of student learning. Research in Higher Education, 53(5), 888-904.