A recent trend in binary code analysis promotes the use of neural solutions based on instruction embedding models. An instruction embedding model is a neural network that transforms assembly instructions into embedding vectors. If the embedding network is able to processes sequences of assembly instructions transforming them into a sequence of embedding vectors, then the network effectively represents an assembly code model . In this paper we present BinBert, a novel assembly code model. BinBert is built on a transformer pre-trained on a huge dataset of both assembly instruction sequences and symbolic execution information. BinBert can be applied to assembly instructions sequences and it is fine-tunable , i.e. it can be re-trained as part of a neural architecture on task-specific data. Through fine-tuning, BinBert learns how to apply the general knowledge acquired with pre-training to the specific task. We evaluated BinBert on a multi-task benchmark that we specifically designed to test the understanding of assembly code. The benchmark is composed of several tasks, some taken from the literature, and a few novel tasks that we designed, with a mix of intrinsic and downstream tasks. Our results show that BinBert outperforms state-of-the-art models for binary instruction embedding, raising the bar for binary code understanding.

BinBert: Binary Code Understanding with a Fine-tunable and Execution-aware Transformer / Artuso, Fiorella; Mormando, Marco; DI LUNA, GIUSEPPE ANTONIO; Querzoni, Leonardo. - In: IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING. - ISSN 1545-5971. - (2024). [10.1109/TDSC.2024.3397660]

BinBert: Binary Code Understanding with a Fine-tunable and Execution-aware Transformer

Fiorella Artuso
Primo
;
Marco Mormando
Secondo
;
Giuseppe Antonio Di Luna
Penultimo
;
Leonardo Querzoni
Ultimo
2024

Abstract

A recent trend in binary code analysis promotes the use of neural solutions based on instruction embedding models. An instruction embedding model is a neural network that transforms assembly instructions into embedding vectors. If the embedding network is able to processes sequences of assembly instructions transforming them into a sequence of embedding vectors, then the network effectively represents an assembly code model . In this paper we present BinBert, a novel assembly code model. BinBert is built on a transformer pre-trained on a huge dataset of both assembly instruction sequences and symbolic execution information. BinBert can be applied to assembly instructions sequences and it is fine-tunable , i.e. it can be re-trained as part of a neural architecture on task-specific data. Through fine-tuning, BinBert learns how to apply the general knowledge acquired with pre-training to the specific task. We evaluated BinBert on a multi-task benchmark that we specifically designed to test the understanding of assembly code. The benchmark is composed of several tasks, some taken from the literature, and a few novel tasks that we designed, with a mix of intrinsic and downstream tasks. Our results show that BinBert outperforms state-of-the-art models for binary instruction embedding, raising the bar for binary code understanding.
2024
Assembly; Task analysis; Vectors; Transformers; Semantics; Binary codes; Benchmark testing
01 Pubblicazione su rivista::01a Articolo in rivista
BinBert: Binary Code Understanding with a Fine-tunable and Execution-aware Transformer / Artuso, Fiorella; Mormando, Marco; DI LUNA, GIUSEPPE ANTONIO; Querzoni, Leonardo. - In: IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING. - ISSN 1545-5971. - (2024). [10.1109/TDSC.2024.3397660]
File allegati a questo prodotto
File Dimensione Formato  
Artuso_BinBert_2022.pdf

accesso aperto

Note: 10.1109/TDSC.2024.3397660
Tipologia: Documento in Pre-print (manoscritto inviato all'editore, precedente alla peer review)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 876.61 kB
Formato Adobe PDF
876.61 kB Adobe PDF
Artuso_BinBert_2024.pdf

solo gestori archivio

Note: Early Access
Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 2.41 MB
Formato Adobe PDF
2.41 MB Adobe PDF   Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1713407
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact