Skip to main content

Voulkopoulou A, Kalouptsoglou I, Kehagias D, Tzovaras D. 2026. Context Matters: Vulnerability Categorization with Dual-Input Code Representations. LIFESEC 2026.

By May 10, 2026May 25th, 2026Publications

Conference:
2nd Workshop on Whole-Lifecycle Security for Smart Systems: Methods and Tools (LIFESEC 2026), collocated with the 12th IEEE International Conference on Smart Computing (SmartComp 2026)
22. June 2026, Messina, Italy

Authors:
Voulkopoulou A, Kalouptsoglou I, Kehagias D, Tzovaras D.

Abstract:
Nowadays, software security is considered an important aspect of the software quality of a project and, therefore, security testing is a critical part of the software development life-cycle. Various techniques have been proposed for the identification of vulnerabilities that reside in the source code, from traditional static analysis to emerging machine learning techniques. However, important challenges persist, such as the categorization of the detected vulnerabilities. Software vulnerability categorization to specific Common Weakness Enumeration (CWE) categories is crucial for interpreting the scope and  potential impact of detected weaknesses and for effective remediation. In this paper, we study multi-class CWE-ID classification from vulnerable C/C++ functions in the Big-Vul dataset, focusing on how code context affects performance. Specifically, we compare function level input, vulnerable-lines input, and their combination. Results show that by providing the vulnerable lines the predictive performance is improved, while the combination of vulnerable lines and the surrounding function code yields even more accurate categorization. Moreover, we propose a dual-encoder architecture hat processes function bodies and vulnerable lines separately before joint learning, achieving further improvement.

 

Leave a Reply