{"id":1868,"date":"2025-08-25T14:24:21","date_gmt":"2025-08-25T12:24:21","guid":{"rendered":"https:\/\/ai4cyber.eu\/?p=1868"},"modified":"2025-08-25T14:24:21","modified_gmt":"2025-08-25T12:24:21","slug":"ai4cyber-blogpost-ensuring-fairness-in-large-language-models-an-emerging-critical-dialogue","status":"publish","type":"post","link":"https:\/\/ai4cyber.eu\/?p=1868","title":{"rendered":"AI4CYBER blogpost: Ensuring Fairness in Large Language Models: An Emerging Critical Dialogue"},"content":{"rendered":"<p><strong>Author: Vincent <span class=\"NormalTextRun SpellingErrorV2Themed SpellingErrorHighlight SCXW238474797 BCX8\">Thouvenot<\/span>\u00a0 <\/strong><\/p>\n<p><span data-contrast=\"auto\">In the rapidly advancing world of Artificial Intelligence, Large Language Models (LLMs) have become a cornerstone of modern technology. These systems are designed to understand and generate human language in remarkably sophisticated ways. However, with great power comes great responsibility, and one of the most crucial yet complex challenges developers and society alike are facing is ensuring fairness in these models.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559738&quot;:120,&quot;335559739&quot;:120,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<p><b><span data-contrast=\"auto\">What is Fairness in LLMs?<\/span><\/b><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559738&quot;:120,&quot;335559739&quot;:120,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">Fairness in LLMs refers to the equitable and unbiased treatment of all users and scenarios that the AI encounters. This means the AI should provide consistent, non-discriminatory outcomes regardless of factors such as race, gender, socioeconomic status, or other potentially sensitive attributes. The goal is to prevent any form of systemic bias in the model outcome that could lead to unfair disadvantages or benefits for particular groups of people.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559738&quot;:120,&quot;335559739&quot;:120,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<p><b><span data-contrast=\"auto\">Why Fairness matters?<\/span><\/b><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559738&quot;:120,&quot;335559739&quot;:120,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">The importance of fairness in LLMs cannot be overstated.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559738&quot;:120,&quot;335559739&quot;:120,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">AI has the potential to shape opinions, decisions, and the overall information ecosystem. Biased AI could inadvertently propagate stereotypes or inaccuracies that harm marginalized communities.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559738&quot;:120,&quot;335559739&quot;:120,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">For AI technology to be widely accepted and trusted, users must feel confident that the system treats everyone fairly. Trust in AI is crucial for its integration into everyday applications like customer service, healthcare, and education.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559738&quot;:120,&quot;335559739&quot;:120,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">Ensuring fairness aligns with broader ethical standards in technology. It&#8217;s about building systems that reflect our values of equality and justice.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559738&quot;:120,&quot;335559739&quot;:120,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<p><b><span data-contrast=\"auto\">What are the challenges in achieving Fairness?<\/span><\/b><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559738&quot;:120,&quot;335559739&quot;:120,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">LLMs are trained on vast datasets sourced from the internet and other repositories. If these datasets contain biases, the models can learn and reproduce them. For example, if training data includes biased views towards certain demographics, the AI may generate content reflecting similar biases.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559738&quot;:120,&quot;335559739&quot;:120,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">Language itself is intricate and context-dependent, making it challenging to detect and mitigate all forms of bias. Sometimes, biases can be subtle and hidden in nuanced expressions or jargon.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559738&quot;:120,&quot;335559739&quot;:120,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">Even with unbiased data, the algorithms used in training LLMs can introduce biases. This can occur due to various factors such as the weighting of different types of data or methodological choices in the machine learning process.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559738&quot;:120,&quot;335559739&quot;:120,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<p><b><span data-contrast=\"auto\">How can we detect bias?<\/span><\/b><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559738&quot;:120,&quot;335559739&quot;:120,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">In order to detect bias in LLMs, we can use automatic bias detectors. They are based on: <\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559738&quot;:120,&quot;335559739&quot;:120,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<ul>\n<li aria-setsize=\"-1\" data-leveltext=\"\uf0b7\" data-font=\"Symbol\" data-listid=\"21\" data-list-defn-props=\"{&quot;335552541&quot;:1,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;\uf0b7&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}\" data-aria-posinset=\"1\" data-aria-level=\"1\"><b><span data-contrast=\"auto\">Intrinsic Bias Evaluation Metrics<\/span><\/b><span data-contrast=\"auto\">: assess biases inherent in the representations produced by LLMs during the pre-training phase, independent of any downstream task, with e.g. <\/span><i><span data-contrast=\"auto\">similarity-based metrics<\/span><\/i><span data-contrast=\"auto\"> that utilize semantically bleached sentence templates to compute similarities between different demographic groups, or <\/span><i><span data-contrast=\"auto\">probability-based metrics<\/span><\/i><span data-contrast=\"auto\"> that formalize the intrinsic bias in terms of the probabilities given by the pre-trained LLMs among the candidates.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559738&quot;:120,&quot;335559739&quot;:120,&quot;335559740&quot;:259}\">\u00a0<\/span><\/li>\n<\/ul>\n<ul>\n<li aria-setsize=\"-1\" data-leveltext=\"\uf0b7\" data-font=\"Symbol\" data-listid=\"21\" data-list-defn-props=\"{&quot;335552541&quot;:1,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;\uf0b7&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}\" data-aria-posinset=\"2\" data-aria-level=\"1\"><b><span data-contrast=\"auto\">Extrinsic Bias Evaluation Metrics<\/span><\/b><span data-contrast=\"auto\">: assess biases that manifest in the outputs of language models during specific downstream tasks, such as classification, generation, or translation, with e.g. <\/span><i><span data-contrast=\"auto\">Natural Language Understanding metrics<\/span><\/i><span data-contrast=\"auto\"> (train a task-specific classifier on the evaluation dataset and then use the output of the classifier as the metric) or <\/span><i><span data-contrast=\"auto\">Natural Language Generation metrics<\/span><\/i><span data-contrast=\"auto\"> (fine-tune the model that is evaluated on an evaluation dataset containing prompts for different conditions and then evaluate generation).<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559738&quot;:120,&quot;335559739&quot;:120,&quot;335559740&quot;:259}\">\u00a0<\/span><\/li>\n<\/ul>\n<p><b><span data-contrast=\"auto\">How can we mitigate bias?<\/span><\/b><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559738&quot;:120,&quot;335559739&quot;:120,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">One practical approach is to carefully curate and preprocess training data to minimize biases. This includes diversifying data sources and implementing rigorous standards for content inclusion.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559738&quot;:120,&quot;335559739&quot;:120,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">Implementing continuous testing and receiving feedback from a diverse group of users can help improve fairness. By constantly revising models based on real-world interactions and critiques, developers can better address fairness issues.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559738&quot;:120,&quot;335559739&quot;:120,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">Involving diverse teams in the development and training of LLMs ensures various perspectives are considered. This approach helps prevent narrow viewpoints from dominating the model\u2019s understanding.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559738&quot;:120,&quot;335559739&quot;:120,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">Fairness based algorithms are existing to mitigate bias:\u00a0<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559738&quot;:120,&quot;335559739&quot;:120,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<ul>\n<li aria-setsize=\"-1\" data-leveltext=\"\uf0b7\" data-font=\"Symbol\" data-listid=\"21\" data-list-defn-props=\"{&quot;335552541&quot;:1,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;\uf0b7&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}\" data-aria-posinset=\"3\" data-aria-level=\"1\"><b><span data-contrast=\"auto\">Pre-processing approaches<\/span><\/b><span data-contrast=\"auto\"> remedy bias directly in the training data. For example, Counterfactual Data Augmentation (CDA) aims to balance datasets by exchanging protected attribute data. For instance, if a dataset contains more instances like \u201cMen are excellent programmers&#8221; than \u201cWomen are excellent programmers,&#8221; this bias may lead LLMs to favour male candidates during the screening of programmer resumes. One way CDA achieves data balance and mitigates bias is by replacing a certain number of instances of \u201cMen are excellent programmers&#8221; with \u201cWomen are excellent programmers&#8221; in the training data.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559738&quot;:120,&quot;335559739&quot;:120,&quot;335559740&quot;:259}\">\u00a0<\/span><\/li>\n<\/ul>\n<ul>\n<li aria-setsize=\"-1\" data-leveltext=\"\uf0b7\" data-font=\"Symbol\" data-listid=\"21\" data-list-defn-props=\"{&quot;335552541&quot;:1,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;\uf0b7&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}\" data-aria-posinset=\"4\" data-aria-level=\"1\"><b><span data-contrast=\"auto\">In-processing approaches<\/span><\/b><span data-contrast=\"auto\"> change the training process, e.g. by modifying the loss function, by adding an additional module that optimize the fairness, with disentanglement or with contrastive learning.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559738&quot;:120,&quot;335559739&quot;:120,&quot;335559740&quot;:259}\">\u00a0<\/span><\/li>\n<\/ul>\n<ul>\n<li aria-setsize=\"-1\" data-leveltext=\"\uf0b7\" data-font=\"Symbol\" data-listid=\"21\" data-list-defn-props=\"{&quot;335552541&quot;:1,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;\uf0b7&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}\" data-aria-posinset=\"5\" data-aria-level=\"1\"><b><span data-contrast=\"auto\">Post-processing approaches<\/span><\/b><span data-contrast=\"auto\"> modify the LLMs outputs, e.g. with projection-based methods that consist of projecting the latent space representation space of the observation in a representation where we are able to separate sensitive and neutral information. Then, the sensitive information is removed.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559738&quot;:120,&quot;335559739&quot;:120,&quot;335559740&quot;:259}\">\u00a0<\/span><\/li>\n<\/ul>\n<p><b><span data-contrast=\"auto\">The Road Ahead<\/span><\/b><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559738&quot;:120,&quot;335559739&quot;:120,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">The quest for fairness in large language models is an ongoing journey. As technology and societal norms evolve, so must the frameworks and strategies we use to ensure fairness too. Collaboration across multiple disciplines\u2014including computer science, ethics, sociology, and law\u2014is essential to navigate the complexities of this issue. In AI4CYBER, the work on TRUST4AI.Fairness component started studying fairness in AI models and in the last months we researched on the challenges of fairness in LLMs.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559738&quot;:120,&quot;335559739&quot;:120,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">In conclusion, fairness in LLMs is a critical frontier in AI development that demands our attention and effort. By prioritizing fairness, we not only enhance the functionality and reliability of these systems but also uphold our commitment to creating technology that serves all humanity equitably.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559738&quot;:120,&quot;335559739&quot;:120,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<p><img fetchpriority=\"high\" fetchpriority=\"high\" decoding=\"async\" class=\" wp-image-1869 aligncenter\" src=\"https:\/\/ai4cyber.eu\/wp-content\/uploads\/2025\/08\/download.webp\" alt=\"\" width=\"366\" height=\"280\" \/><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: Vincent Thouvenot\u00a0 In the rapidly advancing world of Artificial Intelligence, Large Language Models (LLMs) have become a cornerstone of modern technology. These systems are designed to understand and generate human language in remarkably sophisticated ways. However, with great power comes great responsibility, and one of the most crucial yet complex challenges developers and society [&hellip;]<\/p>\n","protected":false},"author":4,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_et_pb_use_builder":"","_et_pb_old_content":"","_et_gb_content_width":"","footnotes":""},"categories":[11],"tags":[],"class_list":["post-1868","post","type-post","status-publish","format-standard","hentry","category-blog"],"_links":{"self":[{"href":"https:\/\/ai4cyber.eu\/index.php?rest_route=\/wp\/v2\/posts\/1868","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ai4cyber.eu\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ai4cyber.eu\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ai4cyber.eu\/index.php?rest_route=\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/ai4cyber.eu\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1868"}],"version-history":[{"count":1,"href":"https:\/\/ai4cyber.eu\/index.php?rest_route=\/wp\/v2\/posts\/1868\/revisions"}],"predecessor-version":[{"id":1870,"href":"https:\/\/ai4cyber.eu\/index.php?rest_route=\/wp\/v2\/posts\/1868\/revisions\/1870"}],"wp:attachment":[{"href":"https:\/\/ai4cyber.eu\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1868"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ai4cyber.eu\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1868"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ai4cyber.eu\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1868"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}