The skyrocketing grow rate of new malware brings novel challenges to protect computers and networks. Discerning truly novel malware from variants of known samples is a way to keep pace with this trend. This can be done by grouping known malware in families by similarity and classifying new samples into those families. As malware and their families evolve over time, approaches based on classifiers trained on a fixed ground truth are not suitable. Other techniques use clustering to identify families but they need to periodically re-cluster the whole set of samples, which does not scale well. A promising approach is based on incremental clustering, where periodically only yet unknown samples are clustered to identify new families, and classifiers are re-trained accordingly. However, the latter solutions usually are not able to immediately react and identify new malware families. In this paper we propose MalFamAware, a novel approach to malware family identification based on an online clustering algorithm, namely BIRCH, which efficiently updates clusters as new samples are fed without requiring to re-scan the entire dataset. MalFamAware is able to both classify new malware in existing families and identify new families at runtime. We present experimental evaluations where MalFamAware outperforms both total re-clustering and incremental clustering solutions in terms of accuracy and time. We also compare our solution with classifiers re-trained over time, obtaining better accuracy, in particular when samples belong to yet unknown families.

MalFamAware: Automatic Family Identification and Malware Classification Through Online Clustering / Pitolli, Gregorio; Laurenza, Giuseppe; Aniello, Leonardo; Querzoni, Leonardo; Baldoni, Roberto. - In: INTERNATIONAL JOURNAL OF INFORMATION SECURITY. - ISSN 1615-5270. - 20:3(2021), pp. 371-386. [10.1007/s10207-020-00509-4]

MalFamAware: Automatic Family Identification and Malware Classification Through Online Clustering

Giuseppe Laurenza
;
Leonardo Querzoni;Roberto Baldoni
2021

Abstract

The skyrocketing grow rate of new malware brings novel challenges to protect computers and networks. Discerning truly novel malware from variants of known samples is a way to keep pace with this trend. This can be done by grouping known malware in families by similarity and classifying new samples into those families. As malware and their families evolve over time, approaches based on classifiers trained on a fixed ground truth are not suitable. Other techniques use clustering to identify families but they need to periodically re-cluster the whole set of samples, which does not scale well. A promising approach is based on incremental clustering, where periodically only yet unknown samples are clustered to identify new families, and classifiers are re-trained accordingly. However, the latter solutions usually are not able to immediately react and identify new malware families. In this paper we propose MalFamAware, a novel approach to malware family identification based on an online clustering algorithm, namely BIRCH, which efficiently updates clusters as new samples are fed without requiring to re-scan the entire dataset. MalFamAware is able to both classify new malware in existing families and identify new families at runtime. We present experimental evaluations where MalFamAware outperforms both total re-clustering and incremental clustering solutions in terms of accuracy and time. We also compare our solution with classifiers re-trained over time, obtaining better accuracy, in particular when samples belong to yet unknown families.
2021
malware analysis; malware family identification; incremental clustering
01 Pubblicazione su rivista::01a Articolo in rivista
MalFamAware: Automatic Family Identification and Malware Classification Through Online Clustering / Pitolli, Gregorio; Laurenza, Giuseppe; Aniello, Leonardo; Querzoni, Leonardo; Baldoni, Roberto. - In: INTERNATIONAL JOURNAL OF INFORMATION SECURITY. - ISSN 1615-5270. - 20:3(2021), pp. 371-386. [10.1007/s10207-020-00509-4]
File allegati a questo prodotto
File Dimensione Formato  
Pitolli_preprint_MalFamAware_2020.pdf

accesso aperto

Note: https://link.springer.com/article/10.1007/s10207-020-00509-4
Tipologia: Documento in Pre-print (manoscritto inviato all'editore, precedente alla peer review)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 18.32 MB
Formato Adobe PDF
18.32 MB Adobe PDF
Pitolli_MaIFamAware_2021.pdf

solo gestori archivio

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 1.69 MB
Formato Adobe PDF
1.69 MB Adobe PDF   Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1413969
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 15
  • ???jsp.display-item.citation.isi??? 12
social impact