In this study, Portable Document Format (PDF), Word, Excel, Rich Test format (RTF) and image documents are taken as the research objects to study a static and fast method by which to detect malicious documents. Malicious PDF and Word document features are abstracted and extended, which can be used to detect other types of documents. A universal static detection framework for malicious documents based on feature generalization is then proposed. The generalized features include specification check errors, the structure path, code keywords, and the number of objects. The proposed method is verified on two datasets, and is compared with Kaspersky, NOD32, and McAfee antivirus software. The experimental results demonstrate that the proposed method achieves good performance in terms of the detection accuracy, runtime, and scalability. The average F1-score of all types of documents is found to be 0.99, and the average detection time of a document is 0.5926 s, which is at the same level as the compared antivirus software.

A universal malicious documents static detection framework based on feature generalization / Lu, X.; Wang, F.; Jiang, C.; Lio, P.. - In: APPLIED SCIENCES. - ISSN 2076-3417. - 11:24(2021). [10.3390/app112412134]

A universal malicious documents static detection framework based on feature generalization

Lio P.
2021

Abstract

In this study, Portable Document Format (PDF), Word, Excel, Rich Test format (RTF) and image documents are taken as the research objects to study a static and fast method by which to detect malicious documents. Malicious PDF and Word document features are abstracted and extended, which can be used to detect other types of documents. A universal static detection framework for malicious documents based on feature generalization is then proposed. The generalized features include specification check errors, the structure path, code keywords, and the number of objects. The proposed method is verified on two datasets, and is compared with Kaspersky, NOD32, and McAfee antivirus software. The experimental results demonstrate that the proposed method achieves good performance in terms of the detection accuracy, runtime, and scalability. The average F1-score of all types of documents is found to be 0.99, and the average detection time of a document is 0.5926 s, which is at the same level as the compared antivirus software.
2021
Feature generalization; Machine learning; Malicious document detection; Static detection
01 Pubblicazione su rivista::01a Articolo in rivista
A universal malicious documents static detection framework based on feature generalization / Lu, X.; Wang, F.; Jiang, C.; Lio, P.. - In: APPLIED SCIENCES. - ISSN 2076-3417. - 11:24(2021). [10.3390/app112412134]
File allegati a questo prodotto
File Dimensione Formato  
Lu_A-universal_2021.pdf

accesso aperto

Note: https://doi.org/10.3390/app112412134912
Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Creative commons
Dimensione 1.52 MB
Formato Adobe PDF
1.52 MB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1719862
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 12
  • ???jsp.display-item.citation.isi??? 5
social impact