The evolving nature of malware poses significant challenges for machine learning-based detectors, demanding frequent updates to handle new threats. As keeping all historical data is impractical due to storage constraints, Continual Learning (CL) algorithms come to help by incrementally updating the detectors without retraining over all previously collected data. Unfortunately, updating the model might cause inconsistencies: the new model can have false positives for goodware that was previously correctly classified, and malware that was detected by the previous model can become undetected by the new one. This issue, referred to as security regression, is often overlooked in concurrent work but can undermine user trust despite overall detection performance improvements. In this work, we address this issue by proposing a learning strategy that combines a replay-based CL method with a regression-aware penalty to preserve the correct decisions of earlier models. Specifically, we adapt the Positive Congruent Training (PCT) strategy to a CL setting, presenting the first regression-aware CL algorithm. Experiments conducted on the ELSA Android dataset demonstrate how this approach significantly reduces security regression while keeping up with the data drift, maintaining high detection performances over time.

Understanding Regression in Continual Learning for Malware Detection / Ghiani, D.; Angioni, D.; Sotgiu, A.; Pintor, M.; Biggio, B.. - 3962:(2025). ( ITASEC 25 Bologna ).

Understanding Regression in Continual Learning for Malware Detection

Ghiani D.
Primo
Formal Analysis
;
2025

Abstract

The evolving nature of malware poses significant challenges for machine learning-based detectors, demanding frequent updates to handle new threats. As keeping all historical data is impractical due to storage constraints, Continual Learning (CL) algorithms come to help by incrementally updating the detectors without retraining over all previously collected data. Unfortunately, updating the model might cause inconsistencies: the new model can have false positives for goodware that was previously correctly classified, and malware that was detected by the previous model can become undetected by the new one. This issue, referred to as security regression, is often overlooked in concurrent work but can undermine user trust despite overall detection performance improvements. In this work, we address this issue by proposing a learning strategy that combines a replay-based CL method with a regression-aware penalty to preserve the correct decisions of earlier models. Specifically, we adapt the Positive Congruent Training (PCT) strategy to a CL setting, presenting the first regression-aware CL algorithm. Experiments conducted on the ELSA Android dataset demonstrate how this approach significantly reduces security regression while keeping up with the data drift, maintaining high detection performances over time.
2025
ITASEC 25
Android Malware, Continual Learning, Negative Flips, Regression Testing
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
Understanding Regression in Continual Learning for Malware Detection / Ghiani, D.; Angioni, D.; Sotgiu, A.; Pintor, M.; Biggio, B.. - 3962:(2025). ( ITASEC 25 Bologna ).
File allegati a questo prodotto
File Dimensione Formato  
Ghiani_Understanding_2025.pdf

accesso aperto

Note: https://ceur-ws.org/Vol-3962/paper59.pdf
Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Creative commons
Dimensione 1.12 MB
Formato Adobe PDF
1.12 MB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1754996
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact