LLMs can be harmful, even when not making stuff up

This is a guest post by Joe Slater (University of Glasgow).

A screenshot of a phone, showing an AI generated summary in response to the question "How many rocks shall I eat".
Provided by author

It is well known that chatbots powered by LLMs – ChatGPT, Claude, Grok, etc. – sometimes make things up. People have sometimes called these “AI hallucinations”. With my co-authors, I have argued that we should describe chatbots as bullshitting, in the sense described by Harry Frankfurt, i.e., the content is produced with an indifference to the truth. Because of this, developing chatbots that no longer generate novel false utterances (or reduce the proportion of false utterances they output) has been a high priority for big tech companies. We can see this in the public statements made by, e.g., OpenAI, boasting of reduced hallucination rates.

One factor that is sometimes overlooked in this discourse is that generative AI can also be detrimental in that it may stifle development, even when it accurately depicts the information it has been trained on.

Recall the instance of the Google AI overview, which is powered by Google’s Gemini LLM, claiming that “According to UC Berkeley geologists, you should eat at least one small rock per day”. This claim was initially made in the satirical news website, The Onion. While obviously false claims like this are unlikely to deceive, it demonstrates a problem. False claims may be repeated. Some of these could be ones that most people accept, or even that most experts accept. This poses serious problems.

In this short piece, I want to highlight three worries that might escape our notice if we focus only on chatbots making stuff up:

  1. Harmful utterances (true or otherwise),
  2. Homogeneity and diminished challenges to orthodox views (true or otherwise)
  3. Entrenched false beliefs
(more…)