Understanding and controlling the behavior of Large Language Models (LLMs) is crucial for their reliable use in software vulnerability detection. While LLMs show promising zero-shot capabilities, our analysis shows that they often behave inconsistently by over-predicting vulnerabilities, overlooking real vulnerabilities in domain shifts. In this paper, we approach vulnerability detection as a behavior shaping problem. We apply Group Relative Policy Optimization (GRPO) to guide the behavior of models through structured rule-based rewards. Our reward verifiers target both the accuracy of predictions and the coherence of explanations, encouraging the model to develop stable and trustworthy decision patterns. Through experiments on BigVul, DiverseVul and CleanVul benchmarks, we show that behavior shaping with GRPO improves the model’s ability to generalize across projects, programming languages, and data quality levels. Furthermore, we show that tuning the regularization’s strength of the Kullback–Leibler (KL) divergence enables a balance between risk-seeking and risk-averse behavior, reducing false negatives without overwhelming users with false positives.
Unmasking Model Behavior: How LLMs Reason on Vulnerability Detection / Fontana, A., Simoni, M.. - LNCS, volume 15997:(2025), pp. 316-333. (ARES Ghent; Belgium ) [10.1007/978-3-032-00639-4_18].
Unmasking Model Behavior: How LLMs Reason on Vulnerability Detection
Simoni, MarcoSecondo
2025
Abstract
Understanding and controlling the behavior of Large Language Models (LLMs) is crucial for their reliable use in software vulnerability detection. While LLMs show promising zero-shot capabilities, our analysis shows that they often behave inconsistently by over-predicting vulnerabilities, overlooking real vulnerabilities in domain shifts. In this paper, we approach vulnerability detection as a behavior shaping problem. We apply Group Relative Policy Optimization (GRPO) to guide the behavior of models through structured rule-based rewards. Our reward verifiers target both the accuracy of predictions and the coherence of explanations, encouraging the model to develop stable and trustworthy decision patterns. Through experiments on BigVul, DiverseVul and CleanVul benchmarks, we show that behavior shaping with GRPO improves the model’s ability to generalize across projects, programming languages, and data quality levels. Furthermore, we show that tuning the regularization’s strength of the Kullback–Leibler (KL) divergence enables a balance between risk-seeking and risk-averse behavior, reducing false negatives without overwhelming users with false positives.| File | Dimensione | Formato | |
|---|---|---|---|
|
Fontana_Unmasking_postprint_2025.pdf
accesso aperto
Note: https://doi.org/10.1007/978-3-032-00639-4_18
Tipologia:
Documento in Post-print (versione successiva alla peer review e accettata per la pubblicazione)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
9.37 MB
Formato
Adobe PDF
|
9.37 MB | Adobe PDF | |
|
Fontana_Unmasking_2025.pdf
solo gestori archivio
Tipologia:
Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
5.63 MB
Formato
Adobe PDF
|
5.63 MB | Adobe PDF | Contatta l'autore |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


