What we train need not be the same as what we assess: AI damage limitation in higher education

It has always been clear that ChatGPT’s general availability means trouble for higher education. We knew that letting students use it for writing essays would make it difficult if not impossible to assess their effort and progress, and invite cheating. Worse, that it was going to deprive them of learning the laborious art and skill of writing, which is good in itself as well as a necessary instruments to thinking clearly. University years (and perhaps the last few years of high school, although, I worry, only for very few) is the chance to learn one’s writing and thinking. When there is quick costless access to the final product, there is little incentive for students to engage in the process of creating that product themselves; and going through that process is, generally, a lot more valuable than the product itself. Last March, philosopher Troy Jollimore published a lovely essay on this theme. So, we knew that non-regulated developments in artificial intelligence are inimical to this main aim of higher education.

Even more concerning news are now starting to find us: Not only is the use of ChatGPT bad for students because the temptation to rely on it is too hard to withstand, but respectable studies such as a recent one authored by scholars at MIT show that AI has significant negative effects on users’ cognitive abilities. The study indicates that the vast majority of people using Large Language Models (LLMs), such as ChatGPT, in order to write, forget the AI-generated content within minutes. Neural connections for the group relying on natural intelligence-only were almost twice as strong as those in the group using LLMs. And regular users who were asked to write without the help of LLMs did worse than those who never used ChatGPT at all. The authors of the study are talking about a “cognitive debt”: the more one relies on AI, the more they lose thinking abilities. All these findings are true of most users; a silver line, perhaps, is that users with very strong cognitive capacities displayed higher neural connection when using LLMs.

In short, LLMs are here to stay, at least until proper regulation – which is not yet on the horizon – kicks in; if this study is right, they can give valuable support to the more accomplished scholars (perhaps at various stages of their career) while harming everybody else. Part of the university’s job is to develop the latter group’s cognitive abilities; encouraging students to use LLMs appears, in light of these considerations, a kind of malpractice. And assigning at home essays is, in practice, encouragement.

What, if anything, can those of us who teach do about this? We have positive reasons to keep trying to teach our students to write – and hence think! – better. It is, after all, the point of our job. And we have negative reasons, too, since other modes of evaluation seem very flawed: oral exams, in which students’ identity is very salient, may increase biases – implicit, explicit… Written exams might be counterproductive for, and very unfair to, some of the best students, who need time to pen down an essay, crafting structure and. Written exams that merely check students’ understanding of concepts, definitions and theories, but not their creativity and critical thinking, are mechanical and, frankly, boring for everybody – and few things are as anti-pedagogical as boredom.

I need a solution, as I start planning next academic year’s classes. I’m guided by the thought that what I have reason to train in my students need not be the basis on which to assess their performance. This insight has been aired before on Justice Everywhere, in Sanat Sogani’s post. Perhaps we should continue to ask students to turn in essays, as a requirement for completing courses, but not mark these essays. Or, for those whose institutional rules prohibit this, to let them be only an insignificant proportion of the final grade. Over the past three years I’ve been asking students to submit 4-5 position papers, of the five-step structure – a format about which I learned from blogs on how to teach philosophy. (Much as I try, I’m unable to track down any of these discussions – help!) They are 500 words long and come in the following form: “Philosopher X says, on topic X, that…” “I say:…” “The reason I disagree with philosopher X etc. is because…” “Someone may think that I am wrong because …” “To this I reply that…” To come up with a good piece respecting this format one needs to understand the superficial versus deep logical structure of their writing, the importance of choosing the most compelling counter-argument to their view, and the difficulty of offering a rejoinder that doesn’t merely re-state their initial argument. Judging from students’ progress from one essay to the next, and from their testimonies, the 500 words essay is a pedagogical success. I shall continue to assign them, and give comments, but not let them count towards the grade. Instead, I’ll go for exams that require sustained reasoning, without however asking students to compose a proper, complete essay on the spot. The 500 word essays will be training for the in-class exam. Thus, at-home essay writing remains mandatory, but with relatively little incentive for students to use AI, since it’s not evaluated. Some will turn to ChatGPT anyways, to save time and toil; but others will write the essays themselves, study the feedback, and learn. We were never going to reach everybody…

I can see another solution, more radical and with independent merit, one that – I confess – I prefer, but which is not actionable at the moment because it sits on top of a collective action problem. Faced with this new challenge, perhaps we could simply ditch grades altogether. Universities have not been, after all, grading students until relatively recently. (Probably not before 1785, and it took a while to catch on.) We could return to a paideic ideal of higher education, in which we strive to help people develop without giving them quantitative feedback: no rankings either. We would decide whether students have taken reasonable steps towards that ideal, and pass them or not. Maybe the advancements in artificial intelligence, and our helplessness with integrating them in the current university model, will ultimately take us there.

I’m interested in your thoughts, and hopeful that they will come on time for my fall preparations.

Anca Gheaus

I work on various issues concerning justice. I am particularly interested in the relevance of personal relationships to moral and political philosophy. I published papers about gender justice, parental rights and duties, the nature and value of childhood, the goods of work and the ideal-non-ideal theory debate.

You may also like...

10 Responses

  1. David Hunter says:

    I think its a valiant effort my concern though Anca is it might go the other way – if its not “worth” anything then the students might be more likely to use AI than not because why bother doing the work…

    • Kaitlin Lucas says:

      I think there’s a couple ways this could be addressed. Anca discussed the “ranking” function of grades. In our current educational system, they also often serve as the primary means of feedback. By emphasizing other means of feedback (self, peer, instructor), I think it’s possible to make the assignments “worth it” without assigning grades. Over time, I also hope that this would shift us away from the current grading system.

      From my experience, students would always welcome much more feedback from their instructors. This isn’t always realistic, so it could also be balanced with peer feedback during class sessions (amongst many other methods).

      If students know that their peers will be reviewing their work, it adds an element of community and trust. For example, I think most people would be seriously frustrated with a fellow colleague if they realized the essay they were thoughtfully critiquing was generated by AI. It would be a waste of their time, and they might rather them not submit anything at all. So there’s some accountability not to waste each others’ (or the instructor’s!) time.

      Another idea might concern the exam. Students could complete the writing exercise Anca described, but there might also be a (graded) question reflecting on how they’ve improved the quality of their work over the semester based on their essays and the feedback they received. I would find it hard (though not impossible) to answer this question in a meaningful, high-quality way without having completed the essays.

      In full disclosure, I’m a big proponent of “ungrading” myself. I also agree that this is a challenging topic with no perfect solution, so I appreciate the discussion your article is generating, Anca!

      • Anca Gheaus says:

        Kaitlin, thanks so much for this answer, it is very helpful. I always use a peer-to-peer exercise on the 500 words essays, just before students submit it to me. (And I was a bit surprised that last term the essays which I think were AI generated received serious engagement. Or maybe students were just trying to give the benefit of the doubt…?)

        The reflection question did not occur to me, it is brilliant! Seriously considering it – once again many thanks!

        • Kaitlin Lucas says:

          Thanks for your response, Anca! That’s a good point, we do tend to give each other the benefit of the doubt, or we might feel uncomfortable exposing or calling out a colleague if we think they’ve used AI. In that sense, forgoing the grades could open up space for more honest conversations over time since the stakes would be lower.

      • Kaitlin Lucas says:

        In graduate school, I also found the expectation to complete 4/5 useful. No questions asked if we needed to skip one assignment due to illness, caregiving responsibilities, other deadlines, etc. Given that a big motivation to use GenAI has to do with workload, time pressure, and competing deadlines, this might also help.

    • Anca Gheaus says:

      David, for this reason I’ll explain that what they train through essays will bear (a lot!) on how they are likely to do in the exam. Also, I will continue the peer-to-peer exercise (see my answer to Kaitlin below.) I know some will use LLMs anyway…

  2. Sanat Sogani says:

    Such an insightful post, Anca! I especially appreciate your references to empirical studies, which make your case for preserving writing while discouraging simplistic copy-paste use of GPT quite convincing.

    I’d like to share two personal experiences:

    1. In my seminar of around 20 students, I conducted an informal poll, asking them to choose between two hypothetical extremes: a zero-tolerance policy on GenAI use versus a fully permissive policy (framed this way for simplicity). Interestingly, over half of the students preferred the zero-tolerance policy.

    2. For my recent BA course with 10 students, I implemented a fully permissive GenAI policy for the final paper. Students selected their own topics based on the course readings and were asked to write an original paper (following a structure similar to the one you recommend for position papers). I observed that none of the students made significant use of GPT to generate content, despite being allowed to do so. I say this with some confidence because they presented outlines of their final papers in class before submission, and the tone, quality, and ideas in their final submissions were broadly consistent with their earlier presentations. Some students—especially non-native English speakers—did use GenAI tools for improving grammar and language, and they openly disclosed this in their papers.

    I am not sure what insights to draw from this very small sample set. Nevertheless, these experiences reinforced my sense that many (if not most) students themselves value the process of writing and are intrinsically motivated to develop original ideas, especially when the assignment allows them to explore topics that genuinely interest them.

    This perspective also came through during the recent AI workshops at CEU, which included students, faculty, and administrative staff. There was broad consensus that using GenAI for grammar or language polishing is acceptable, but that its role in generating substantive essay content should remain limited. Encouragingly, this view was shared by many students themselves.

    • Anca Gheaus says:

      Sanat, thanks, I am grateful for these thoughts. And cheered by your experience in the 10 students class! 🙂 It’s tempting to apply the same model you describe in classes that are small enough in size to allow it (i.e. to allow a proper essay outline presentation at the end of the class.) Not really applicable in the larger course I teach, where I expect many more students.

  3. Antonia O says:

    I am not an academic, but this topic is pertinent to all professions, and to those depending on the written word in particular. Use of AI is becoming widespread very fast, so I think it may be a case of “if you can’t beat them, join them.” It is very likely that more and more students will use AI — the most important thing perhaps is “how” they use it. Why not have as the first assignment of the year a split exercise: pick one topic for an essay and ask one half of the students to write it using exclusively AI and the other half to exclusively write it the “long” way, without the use of AI, but using traditional methods like online research. Then, ask them to compare the essays and discuss the pros and cons, as well as their respective experiences when writing these essays, the methods they used (did they ask the AI to provide links to sources? Did they check all facts and figures independently? etc.). They could also compare styles (in my experience, AI-generated prose is generally verbose fluff, but who knows, maybe some models are better than others) and finally, they could compare how much they remember/have understood about the topic. It could turn out to be a fun exercise and at the very least it would open their eyes to the drawbacks of the technology, not just its ease of use.

Leave a Reply to Kaitlin Lucas Cancel reply

Your email address will not be published. Required fields are marked *