Author: Vincent Thouvenot (THALES Group)

Artificial Intelligence (AI) and Machine Learning (ML) models are more and more deployed in various (potentially critical) systems. However, many of these systems are found vulnerable to attacks, bias against some groups, or leaks personal or sensitive information about individual. Moreover, to enforce confidence in AI and ML models predictions for end user and robustify the model, we need some tools to explain, or at least interpret, ML and AI outputs. All these things are important to increase the people’s trust in AI and ML based systems.

In AI4CYBER, we develop the component TRUST4AI that matches three Trustworthy AI objective: the subcomponent TRUST4AI.XAI is dedicated to the interpretability of ML and AI systems, the subcomponent TRUST4AI.Fairness is about Fairness of ML and AI systems while TRUST4AI.Security enforce Security of others AI4CYBER’s components.

With TRUST4AI.XAI subcomponent, we will propose to the others AI4CYBER’s components tools based on AI and ML to inspect and interpret the models that they produce. We will propose both global interpretations, where the ML and AI model general behaviours are explained e.g. by ranking the features according their importance on model’s outputs or errors, or plotting the effect of each feature on the model outputs or errors, and local interpretations, where some tools are used to explain one prediction according the effects of the features on the prediction with feature attribution methods (e.g. with Shapley Values) or by using counterfactual example, which provides some input examples with minimal changes to alter the model’s prediction. Based on such approaches, TRUST4AI.XAI will provided a dashboard with large variety of xAI approaches that eases the use of xAI methods by end-users and a set of Python modules dedicated to interpretability for data scientist and model engineers.

TRUST4AI.Fairness will provide to the others AI4CYBER components tools dedicated to Fairness in ML and AI. Fairness is often understood through two distinct notions: disparate treatment and disparate impact. As an example, we wish that machine learning model outputs and errors could be similar for two sub-populations. In the same way, two similar individuals should receive closed model decisions. Machine Learning models tend to reproduce and amplify biases. These biases can come from the data: there are known biases such as selection bias when sampling is poor, historical biases, when a population is disadvantaged, and so on. But biases that can also be due to algorithms: some recommendation algorithms lock people in bubbles instead of offering them new possibilities or when data are unbalanced. If a data set already exhibits biases, then there is a fewer opportunity that future observations will contradict former predictions. Indeed, new data collection could be guided by past decisions of the machine learning model. Moreover, labels are often created by humans and a model will tend to reproduce those bias to increase its performance. At group-level discrimination, a minority group could be penalized because of a small sample size or features less informative for that group characteristics. The consequence is a disparate result between this group and the majority group. TRUST4AI.Fairness will provide tools both to detect and mitigate bias in AI4CYBER components based on ML and AI.

TRUST4AI.Security is responsible for researching and delivering mechanisms for countering attacks to the AI-based threat and anomaly detection models. Even if the objective of a ML model is to generalize information about some individuals on the whole population, it has been proved that it is possible to leak some (potentially private and sensitive) information about the training set. For instance, Membership inference attack aims to predict if one instance was or not in the training set. Inversion attack aims to rebuild sensitive attributes from the model outputs. Poisoning attack aims to modify the AI system’s behaviour by introducing corrupted data during the training phase. To fight against these attacks, TRUST4AI.Security will research on adversarial machine learning mitigation approaches such as the use of differential privacy, noise additions during training, outlier detection and post-hoc model inspection.