Xenophobic bias in Large Language Models
In this post Annick Backelandt argues that xenophobia should be understood as a distinct bias in Large Language Models, rather than being subsumed under racial bias. She shows how LLMs reproduce narratives of “foreignness” that particularly affect migrants and refugees, even without explicit racial references.

By now, it is well known that Large Language Models (LLMs) suffer from biases. One bias in particular that has commanded a lot of attention is racial bias, and rightfully so. Here, I want to address xenophobia, a related bias that often seems to get mixed up in the discourse on racial bias. LLMs have been repeatedly found to generate biased output about migrants and refugees. Consider, for instance, how the popular chatbot ChatGPT was observed to advise lower-paying jobs to non-Western migrants, compared to Western migrants with the same working experience, or how LLMs generate harmful stereotypes, including metaphors such as ‘immigrants are like animals’ and ‘refugees are a plague’.
Such problematic AI-generated content is often condemned on the grounds of being racist. However, xenophobic output in LLMs is not always the result of racial bias. Instead, I propose that we look at xenophobia as a distinct bias with acute ethical concerns. This focus allows us to identify when exactly xenophobia arises and how to fight it more effectively.
You are what you eat
Let’s start by taking a closer look at why these biases occur in LLMs in the first place. LLMs are deep learning algorithms trained on large amounts of textual data. In this training process, LLMs learn linguistic patterns, which allow them to perform language processing tasks. For instance, LLMs are particularly good at stringing together information to produce coherent texts. Unfortunately, this training process also leads to LLMs learning biases. Popular datasets (e.g. Common Crawl) contain mainly internet-based data, in which English and Western-centric sources dominate. In turn, LLMs adopt biases present in these datasets and perpetuate them in their output.
Why are xenophobia and racism different?
Now, among these biases that LLMs pick up are xenophobia and racism. Let’s zoom in on each and, importantly, how they differ. While xenophobia and racism often go together, there are important differences at the core of these concepts. Racism assumes that racial differences between groups amount to some races being better, or worse, than others. This drives discrimination based on racialised categories that exist in the racists’ imagination.
While xenophobia may also target racialised categories, this is not necessarily the case. At the core of the concept lies a claim to civic belonging. Xenophobia is a type of us-versus-them thinking, which differentiates a civic in-group from a perceived outgroup. This drives discrimination based on the idea of foreignness, which targets groups like migrants, refugees and citizens with a migration background, not because they belong to some other race but because they ‘do not belong’.
These two may coincide, but they also appear separately. After all, facing racial discrimination does not necessitate having one’s civic belonging questioned. The other way around, anti-immigrant prejudice may occur even when those who face it are not perceived as racially different from the civic in-group.
Applied to LLMs, we also see that while output may be clearly xenophobic, it does not have to feature references to racial differences or hierarchies. Consider the abovementioned metaphor that compares immigrants to animals. This type of dehumanising language relies on the popular imagery of immigration as a threat. Similar rhetoric is often employed by the political far right to indicate who belongs and who does not. This impacts the public perception of migrants, thereby driving discrimination based on perceptions of foreignness.
Why does this matter?
Xenophobia in LLMs has acute ethical concerns. As LLMs are becoming driving forces in our digital infrastructures, failing to see xenophobia as a distinct bias risks overlooking its role in creating and sustaining patterns of disadvantage for structurally marginalised groups. This becomes particularly visible when LLMs are used in contexts of migration. Let´s unpack this through the example of machine translation.
Arguably, LLMs’ translation services can help overcome language barriers between speakers of different languages. This is especially promising in a migration context, where language barriers pose real hurdles to economic and social participation. But even before that, a language barrier might complicate bureaucratic migration procedures.
In such cases, LLMs are increasingly used for translation purposes. In such procedures, however, faulty or flawed translations can cause delays or, worse, support reasons for the rejection of migration and asylum applications.
Enter xenophobic bias, which may insert ideological interpretations in translations. Imagine an LLM is used to translate an asylum application from an Iranian refugee. In her testimony, she comments on the political situation in her home country, while also referencing severe economic inequality and corruption in governmental institutions. In translation, an LLM might overemphasise such economic references in line with xenophobic tropes that link migration to economic opportunism. While these insertions might be subtle, they can nonetheless impact the nuance of the testimony as a whole by introducing inconsistencies or by misrepresenting key details. Overemphasising the economic context, for instance, may sow doubt about credible threat or a person’s motive for seeking asylum. This constitutes a classic case of testimonial injustice following Miranda Fricker, which occurs when a listener assigns a speaker a lower level of credibility than is warranted due to negative identity prejudice.
Consider also how the training of LLMs may increase the risk of such misinterpretations for low-resource languages. LLMs are data-hungry models, requiring large amounts of data to adopt reliable patterns. A lack of sufficiently large datasets results in poorer translations. The overrepresentation of Anglophone sources can prompt hegemonic interpretations of linguistic subtleties that conform to Western stereotypes about non-Western cultures.
As a result, xenophobic bias can increase obstacles for individuals navigating already imperfect migration systems. Xenophobic bias, then, has acute ethical concerns that risk going unnoticed when we treat xenophobia as a mere footnote to racism.
Annick Backelandt has recently completed a Master’s degree in Philosophy at Tilburg University (The Netherlands). Her research interests include linguistic justice, epistemic injustice and ethical issues related to big data technology.


