Author: Vincent Thouvenot
In the rapidly advancing world of Artificial Intelligence, Large Language Models (LLMs) have become a cornerstone of modern technology. These systems are designed to understand and generate human language in remarkably sophisticated ways. However, with great power comes great responsibility, and one of the most crucial yet complex challenges developers and society alike are facing is ensuring fairness in these models.
What is Fairness in LLMs?
Fairness in LLMs refers to the equitable and unbiased treatment of all users and scenarios that the AI encounters. This means the AI should provide consistent, non-discriminatory outcomes regardless of factors such as race, gender, socioeconomic status, or other potentially sensitive attributes. The goal is to prevent any form of systemic bias in the model outcome that could lead to unfair disadvantages or benefits for particular groups of people.
Why Fairness matters?
The importance of fairness in LLMs cannot be overstated.
AI has the potential to shape opinions, decisions, and the overall information ecosystem. Biased AI could inadvertently propagate stereotypes or inaccuracies that harm marginalized communities.
For AI technology to be widely accepted and trusted, users must feel confident that the system treats everyone fairly. Trust in AI is crucial for its integration into everyday applications like customer service, healthcare, and education.
Ensuring fairness aligns with broader ethical standards in technology. It’s about building systems that reflect our values of equality and justice.
What are the challenges in achieving Fairness?
LLMs are trained on vast datasets sourced from the internet and other repositories. If these datasets contain biases, the models can learn and reproduce them. For example, if training data includes biased views towards certain demographics, the AI may generate content reflecting similar biases.
Language itself is intricate and context-dependent, making it challenging to detect and mitigate all forms of bias. Sometimes, biases can be subtle and hidden in nuanced expressions or jargon.
Even with unbiased data, the algorithms used in training LLMs can introduce biases. This can occur due to various factors such as the weighting of different types of data or methodological choices in the machine learning process.
How can we detect bias?
In order to detect bias in LLMs, we can use automatic bias detectors. They are based on:
- Intrinsic Bias Evaluation Metrics: assess biases inherent in the representations produced by LLMs during the pre-training phase, independent of any downstream task, with e.g. similarity-based metrics that utilize semantically bleached sentence templates to compute similarities between different demographic groups, or probability-based metrics that formalize the intrinsic bias in terms of the probabilities given by the pre-trained LLMs among the candidates.
- Extrinsic Bias Evaluation Metrics: assess biases that manifest in the outputs of language models during specific downstream tasks, such as classification, generation, or translation, with e.g. Natural Language Understanding metrics (train a task-specific classifier on the evaluation dataset and then use the output of the classifier as the metric) or Natural Language Generation metrics (fine-tune the model that is evaluated on an evaluation dataset containing prompts for different conditions and then evaluate generation).
How can we mitigate bias?
One practical approach is to carefully curate and preprocess training data to minimize biases. This includes diversifying data sources and implementing rigorous standards for content inclusion.
Implementing continuous testing and receiving feedback from a diverse group of users can help improve fairness. By constantly revising models based on real-world interactions and critiques, developers can better address fairness issues.
Involving diverse teams in the development and training of LLMs ensures various perspectives are considered. This approach helps prevent narrow viewpoints from dominating the model’s understanding.
Fairness based algorithms are existing to mitigate bias:
- Pre-processing approaches remedy bias directly in the training data. For example, Counterfactual Data Augmentation (CDA) aims to balance datasets by exchanging protected attribute data. For instance, if a dataset contains more instances like “Men are excellent programmers” than “Women are excellent programmers,” this bias may lead LLMs to favour male candidates during the screening of programmer resumes. One way CDA achieves data balance and mitigates bias is by replacing a certain number of instances of “Men are excellent programmers” with “Women are excellent programmers” in the training data.
- In-processing approaches change the training process, e.g. by modifying the loss function, by adding an additional module that optimize the fairness, with disentanglement or with contrastive learning.
- Post-processing approaches modify the LLMs outputs, e.g. with projection-based methods that consist of projecting the latent space representation space of the observation in a representation where we are able to separate sensitive and neutral information. Then, the sensitive information is removed.
The Road Ahead
The quest for fairness in large language models is an ongoing journey. As technology and societal norms evolve, so must the frameworks and strategies we use to ensure fairness too. Collaboration across multiple disciplines—including computer science, ethics, sociology, and law—is essential to navigate the complexities of this issue. In AI4CYBER, the work on TRUST4AI.Fairness component started studying fairness in AI models and in the last months we researched on the challenges of fairness in LLMs.
In conclusion, fairness in LLMs is a critical frontier in AI development that demands our attention and effort. By prioritizing fairness, we not only enhance the functionality and reliability of these systems but also uphold our commitment to creating technology that serves all humanity equitably.