Unmasking Model Behavior: How LLMs Reason on Vulnerability Detection

Fontana, Aleksandar; Simoni, Marco

doi:10.1007/978-3-032-00639-4_18

Understanding and controlling the behavior of Large Language Models (LLMs) is crucial for their reliable use in software vulnerability detection. While LLMs show promising zero-shot capabilities, our analysis shows that they often behave inconsistently by over-predicting vulnerabilities, overlooking real vulnerabilities in domain shifts. In this paper, we approach vulnerability detection as a behavior shaping problem. We apply Group Relative Policy Optimization (GRPO) to guide the behavior of models through structured rule-based rewards. Our reward verifiers target both the accuracy of predictions and the coherence of explanations, encouraging the model to develop stable and trustworthy decision patterns. Through experiments on BigVul, DiverseVul and CleanVul benchmarks, we show that behavior shaping with GRPO improves the model’s ability to generalize across projects, programming languages, and data quality levels. Furthermore, we show that tuning the regularization’s strength of the Kullback–Leibler (KL) divergence enables a balance between risk-seeking and risk-averse behavior, reducing false negatives without overwhelming users with false positives.

Unmasking Model Behavior: How LLMs Reason on Vulnerability Detection / Fontana, Aleksandar; Simoni, Marco. - (2025), pp. 316-333. - LECTURE NOTES IN COMPUTER SCIENCE. [10.1007/978-3-032-00639-4_18].

Unmasking Model Behavior: How LLMs Reason on Vulnerability Detection

Fontana, Aleksandar;Simoni, Marco^Secondo

2025

Abstract

Understanding and controlling the behavior of Large Language Models (LLMs) is crucial for their reliable use in software vulnerability detection. While LLMs show promising zero-shot capabilities, our analysis shows that they often behave inconsistently by over-predicting vulnerabilities, overlooking real vulnerabilities in domain shifts. In this paper, we approach vulnerability detection as a behavior shaping problem. We apply Group Relative Policy Optimization (GRPO) to guide the behavior of models through structured rule-based rewards. Our reward verifiers target both the accuracy of predictions and the coherence of explanations, encouraging the model to develop stable and trustworthy decision patterns. Through experiments on BigVul, DiverseVul and CleanVul benchmarks, we show that behavior shaping with GRPO improves the model’s ability to generalize across projects, programming languages, and data quality levels. Furthermore, we show that tuning the regularization’s strength of the Kullback–Leibler (KL) divergence enables a balance between risk-seeking and risk-averse behavior, reducing false negatives without overwhelming users with false positives.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2025
			
	Titolo del volume
	
				Lecture Notes in Computer Science
			
	ISBN
	
				9783032006387
9783032006394
			
	Parole chiave
	
				Group Relative Policy Optimization; Large Language Models; Model Behavior; Reinforcement Learning; Vulnerability Detection
			
	Tipologia
	
				02 Pubblicazione su volume::02a Capitolo o Articolo
			
	Citazione
	
				Unmasking Model Behavior: How LLMs Reason on Vulnerability Detection / Fontana, Aleksandar; Simoni, Marco. - (2025), pp. 316-333. - LECTURE NOTES IN COMPUTER SCIENCE. [10.1007/978-3-032-00639-4_18].

File allegati a questo prodotto

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1752464

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

0

ND

Catalogo dei prodotti della ricerca