Authors: Pavlos Bouzinis, Dimitris Asimopoulos (Metamind Innovations) 

Date: 05/03/2025 

The digital landscape faces constant threats, with cybersecurity challenges continuously evolving and cyber-attacks growing more sophisticated and widespread. A key line of defense against these advanced threats are intrusion detection systems (IDS), a crucial tool for identifying potential intrusions during system operation. In the last years, the emergence of artificial intelligence (AI) and machine learning (ML) has led to the development of AI-driven intrusion detection systems, offering enhanced threat detection capabilities. In particular, a decentralized learning approach known as federated learning (FL) is a strong candidate for training IDS in a distributed manner. This method enables IDS to learn from traffic data, without transferring it to third parties, thereby ensuring privacy and security. The main focus of this post is to explain the challenges in FL training and how the AI4CYBER project is actively working to overcome these obstacles as part of the AI4FIDS component of the AI4CYBER framework, our federated deep learning IDS. 

One of the primary challenges in FL training is the inherent data heterogeneity across datasets from clients in the federation. In a decentralized environment, each client collects and processes data independently, leading to variations in data distributions, feature spaces, and label distributions. This non-IID (non-independent and identically distributed) nature of the data can significantly impact model convergence, as traditional optimization techniques struggle to generalize across diverse datasets. As a result, FL models may exhibit biased learning, favoring clients with dominant data distributions while underperforming on minority patterns. This issue can degrade the performance of FL-based IDS, rendering them unreliable for real-world deployment. Addressing this challenge requires robust aggregation methods, adaptive learning techniques, and strategies to balance the contributions of different clients to ensure a fair, accurate, and efficient global IDS which is able to accurately detect intrusions for different scenarios of dataset sizes and heterogeneity among clients. 

The AI4CYBER project recognizes the critical importance of FL-based IDSs that are reliable. Towards this direction, AI4CYBER has put considerable effort into the design of algorithms that focus on enhancing the detection accuracy of FL-based IDS, making them both suitable and reliable for practical use. As part of AI4CYBER’s initiatives, the StatAvg FL aggregation technique was developed to address feature shift in client data distributions. StatAvg introduces a global data scaling mechanism to improve model consistency across decentralized datasets. Our approach preserves privacy while significantly enhancing the detection accuracy of FL-based IDS, outperforming various state-of-the-art FL aggregation techniques. An experiment demonstrating the superiority of StatAvg and its comparison with other methods is presented in Figure 1. A preprint version of the technical specifications of StatAvg can be found here 

Figure 1: StatAvg’s performance evaluation 

In line with AI4CYBER’s commitment to reproducible and open-source research, we are excited to announce that StatAvg has been integrated as a baseline technique in the repository of the Flower Framework. Flower is one of the most widely recognized frameworks for Federated Learning, adopted by both academia and industry, with a strong and active community continuously enhancing its capabilities. The open-source code for StatAvg is available at this link. Interested readers and researchers can easily reproduce our experiments or incorporate StatAvg into their own work, as it adheres to Flower’s baseline templates. 

The journey embarked upon by AI4CYBER aligns with our commitment to advancing the field of cybersecurity through innovative, privacy-preserving cybersecurity solutions. By developing and integrating cutting-edge techniques, we aim to address the challenges in AI-based IDS and improve their effectiveness. With a focus on reproducibility and open-source collaboration, AI4CYBER continues to push the boundaries of what’s possible in decentralized cybersecurity, ensuring that our solutions are not only effective but also accessible to researchers and practitioners worldwide.