Student use of ChatGPT in higher education: focus on fairness
From the long-form essay to concise term definitions, ChatGPT can be an apt tool for students in completing various assignments. Yet many educators balk at its use: they emphasize that ChatGPT makes errors, claim that its use is cheating, and that students using it learn nothing.
What sorts of policies should educators adopt? Our options fall into three main categories:
1) Explicitly forbid the use of ChatGPT.
2) Have no explicit policy: ‘don’t ask, don’t tell’.
3) Explicitly allow the use of ChatGPT.
In this post, I look at these three options from the perspective of fairness. Since fairness thrives on transparency, 3) seems to me the fairest of them all.
The lack of any explicit policy strikes me as the least equitable approach. It entails a lack of transparency about what is expected of students, what is accepted, and what practical consequences student’s choices concerning the use of technologies have, that creates disparities that may track existing privileges and disadvantages.
The student who thinks ‘if it wasn’t explicitly forbidden, then no one can penalize me for it’, the student who thinks it is surely forbidden and will result in a penalty, and the student who does not know enough about language models to make informed choices about their use, either lose or benefit based on their prior background (and other factors). A clear, unambiguous policy will help address lack of fairness stemming from epistemics; lack of such an unambiguous, transparent policy undermines fair treatment of students.
So, should we unambiguously allow the use of language models in assignments, or unambiguously forbid it?
When beginning to develop my classroom policy, my chief fairness concern with ChatGPT had to do with grading. While ChatGPT often makes errors, it sometimes produces a great response, particularly for shorter assignments. It didn’t seem fair to me if a student who uses a single prompt to produce a text is lucky with ChatGPT and receives top marks, while another student toils to grasp and express the substance of the class yet receives a lower grade. How ought we to address this concern? Pass/fail grading is one way to evade the issue, and a practice with multiple advantages: the pandemic highlighted that it can help address fairness concerns – concerns that ultimately have always been there. However, it is not something educators are always at liberty to do. What other approaches could I have to mitigating this concern? Forbidding the use of ChatGPT would be one approach; for such an approach, some students may still use ChatGPT to their advantage, but it is in principle possible to penalize that if found out. Explicitly allowing the use of ChatGPT and explaining to students how to use it effectively is another way to mitigate this concern, as it allows all students to experiment with ChatGPT and come to their own conclusions about whether, and how, it helps them improve their writing.
When I started this year’s teaching, my institution didn’t yet have guidelines on the use of ChatGPT that I would have been obligated to follow. I resolved to sort things out with my students, an interdisciplinary pool of undergraduates most of whom were taking their first philosophy class. While I had some ideas of what I thought might work, I reckoned the students will likely have deep insight on the matter. And they did. They strongly believed that prohibition only works if it can be monitored – and that the lack of feasible ways to monitor student use of ChatGPT ruled out prohibition as an approach. “In the absence of reliable monitoring mechanisms, wouldn’t forbidding ChatGPT result in false accusations and arbitrary policing?”, they asked.
When I posed the question as “Should we allow the use of ChatGPT in assignments or not?”, my students flat-out refused to think in binary terms, instead emphasising the myriad ways in which one may interact with a language model in the process of producing an essay. The rich variety of ways to interact with language models has since been highlighted by many, including in this helpful piece by Smart & Botha. Students weren’t just annoyed by the yes-or-no thinking, they also brought up their desire to learn how to get the best results out of the language model. They suggested that it would be good to document the language model use in detail, so that those who use language models in producing their essays can swap tips in how to use them productively, avoiding errors while improving the quality of their written outputs. This led to us adopting a recommendation: students are encouraged but not required to document exactly how they interacted with ChatGPT in producing their work.
This multiplicity of possible ways to use ChatGPT raises the question how those who wish to forbid the use of ChatGPT are to delineate what constitutes forbidden use: for example, a student using ChatGPT to prompt possible objections to the view they’re defending may have no actual text written by ChatGPT in their assignment, but some of the ideas therein are ChatGPT-originated. Then again, we typically don’t think incorporating comments by others undermine authorship. Vice versa, students may use ChatGPT to improve the style and flow of a draft they’ve written. In that case, the ideas are the student’s, and ChatGPT produces a rewritten text rather than the content. For those who want to forbid the use of ChatGPT, creating a transparent, unambiguous policy may not be easy. By allowing the use of ChatGPT, I avoid creating unnecessarily convoluted and hard-to-understand guidelines for students.
As my students highlighted, forbidding the use of ChatGPT raises concerns about the fair treatment of students with regard to penalties and blame. For use of ChatGPT to carry a penalty, we need to be able to distinguish ChatGPT-generated text from student-generated text. This is not always easy, as both language models and students produce text of highly variable quality. While OpenAI offers a classifier designed to detect the use of ChatGPT, they state flat out that the classifier is not reliable – and they are correct. I tested it myself by feeding in short writing assignments by undergraduates from before ChatGPT launched, and the classifier flagged over half of them as AI-generated.
In practice, for use of ChatGPT to carry any penalty (even if it’s just ducking a few points from the grade), fair treatment of students requires that we know whether a text is AI-generated or not. However, educators cannot currently know whether this has happened unless they have a confession. Asking certain students (and not others) to verify whether they have used ChatGPT in their assignments runs a very real risk of (unintentional) student profiling, a risk that we need to take measures to mitigate. I have doubts about whether a classroom where students know educators may attempt to elicit such a confession is conducive to students’ motivation, learning process, and academic performance in the long run.
Socially or financially well-placed students have always been able to commission others to write their assignments, so allowing ChatGPT does not remove the need to stay mindful of student integrity to some extent. However, explicitly allowing the use of ChatGPT removes any need to play technology police in the classroom. When it comes to integrity, prevention, to me, is better than playing detective; and integrity issues are best prevented by building rapport. An encouraging, constructive classroom with mutual trust and respect is not just intrinsically valuable, it is also a means to intrinsic motivation to academic integrity.
Because policing technology use risks undermining such a classroom, while both me and my students would prefer that students always disclose the use of ChatGPT, we decided to make this a recommendation rather than a strict requirement.
Finally, allowing ChatGPT can promote fairness. As Smith & Botha point out, “ChatGPT can serve as a free and accessible copy editor to help disadvantaged students”. That is, ChatGPT can help address a source of unfair grading in undergraduate classes, namely writing prowess stemming from students’ family, language, and educational backgrounds. If we are grading students based on whether they can produce a well-written, grammatically correct text with a nice flow to it, unless we are teaching a writing class, I worry whether this constitutes grading students based on their performance in learning what the class in fact teaches, or based on what they have learned before. Out of two students who have internalized the substance of the class equally well, but one is a good writer and one is not, the latter may not be able to highlight all that they’ve learned. Poor writing is also sometimes the result of a disability, raising further concerns about equitability if grading is based on writing skill. Unless the class is one where writing skill is a reasonable prerequisite, essays ought to be assessed as opportunities for students to express what they have learned – using any tools that help them do just that.
In light of these fairness concerns, my current policy is to explicitly allow the use of ChatGPT. ChatGPT-generated work is assessed on a par with work that doesn’t utilise ChatGPT. Students are encouraged to experiment with the language model, to document how they interacted with it, and to stay aware that sometimes the work it produces is misinformed. Explicitly allowing the use of ChatGPT allows me to create a classroom where students are well placed to make informed decisions about whether they want to make use of this technology in their writing. It lets me focus on productive engagement with students over playing detective, and to foster an environment where student integrity is promoted through building mutual respect and rapport.
The information that you provide is extremely valuable and will definitely benefit others. Thank you for your article. Awesome!