Hunting down vulnerabilities with AI-enhanced Symbolic Execution

Author: Edit Pengő, István Siket

In the ever-evolving world of software development, ensuring that your code is robust, secure, and free from bugs is paramount. Traditional testing methods, like unit tests and integration tests, are essential but have limitations. Enter symbolic execution – a powerful technique that can take your software testing to the next level. In the following post, we’ll break down symbolic execution into easy-to-understand concepts, exploring what it is, and how it can be enhanced with AI technologies.

What is Symbolic Execution?

Imagine you’re testing a piece of software. Normally, you’d run it with specific inputs to see how it behaves. Symbolic execution, however, takes a different approach. Instead of using actual inputs, it uses symbols to represent a wide range of possible inputs. This allows it to explore multiple execution paths of the program in one go, rather than one path at a time.

How Does Symbolic Execution Work?

The symbolic execution engine runs the program, substituting the symbolic inputs wherever the actual inputs would normally be used. As the program executes, it explores different paths based on the conditions it encounters. This is called path exploration. For each path the program takes, the symbolic execution engine records the conditions (constraints) that must be true for that path to be taken. For instance, if the program has an if statement, it will record the condition under which the if branch is taken and the condition under which the else branch is taken. These constraints form the so-called path condition. Symbolic engines use a constraint solver to determine if there are actual inputs that satisfy the conditions for each path. This helps identify which paths are feasible and which ones are not.

Key benefits of Symbolic Execution

Comprehensive Coverage: Traditional testing often misses edge cases and rarely exercises paths in the code. Symbolic execution systematically explores these paths, ensuring a more thorough testing.

Bug Discovery: By exploring a wide range of inputs and paths, symbolic execution can uncover bugs that might be missed by other testing methods. This includes finding security vulnerabilities, such as buffer overflows and integer overflows.

Automated Test Generation: Generating test cases manually can be time-consuming and error-prone. Symbolic execution automates this process, producing test cases that cover a wide range of scenarios.

SourceMeter

SourceMeter is a static analyzer toolchain developed by FrontEndART Software Ltd. It analyzes C/C++, Java, C#, Python, and JavaScript projects. It calculates source code metrics, detects code clones, and finds coding rule violations in the source code. One of its components is a Symbolic Execution Engine for Java. This executor is designed to detect runtime exceptions in Java source code without executing the application in a real-life environment. Currently, it can detect four kinds of common failures:

NullPointerException

ArrayIndexOutOfBoundsException

NegativeArraySizeException

DivideByZeroException

The symbolic execution is called for each method in the program separately. For big systems, this approach is usually a better solution than only starting the execution from the main() method.

Challenges and Limitations

While symbolic execution is powerful, it’s not without challenges. The number of possible paths grows exponentially with the number of branches (path explosion), making it difficult to explore all paths. Moreover, solving complex constraints can be computationally expensive and time-consuming. It is not surprising that symbolic execution suffers from scalability issues on larger systems. For this reason, there are numerous heuristics and methods to improve the scalability of symbolic execution. In the following section , we discuss the possibilities of combining AI and symbolic execution.

AI-enhanced Symbolic Execution

By leveraging AI’s predictive capabilities, learning algorithms, and optimization techniques, we can overcome the aforementioned challenges. Potential uses:

Reinforcement Learning (RL): Reinforcement learning can be used to guide the path exploration process. By learning which paths are more likely to lead to bugs or critical code sections, an RL agent can prioritize these paths, making the symbolic execution process more efficient.
Optimizing Constraint Solving: Machine learning models can predict the complexity of constraints and choose the most appropriate solving strategies. For instance, a model can decide whether to use a lightweight heuristic-based solver or a more powerful but resource-intensive solver based on the characteristics of the constraints.
Neural Networks: Neural networks can learn from previous symbolic execution runs to predict which execution paths are likely to reveal new bugs. This can optimize the path exploration process.

Conclusion

We have experimented in the past to improve the path selection in SourceMeter’s symbolic executor. We preferred paths that had more null value checks, as we assumed that these paths would have more operations that could result in a NullPointerException. We want to further improve the path selection algorithm using AI.

Hunting down vulnerabilities with AI-enhanced Symbolic Execution

Recent posts

Contact us

Coordinator: Fundación Tecnalia Research & Innovation (TECNALIA)

Follow us