The widespread diffusion of IoT devices and the growing availability of open-source software are amplifying the demand for autonomous solutions in the field of binary code analysis. As devices running software continue to increase, they generate various amounts of data that need to be analyzed for various purposes, such as security assessments, performance optimization, and code validation. This increasing volume of data, combined with the complexity of modern software and firmware, necessitates efficient tools and methodologies to assist researchers and analysts in binary code analysis. To address these challenges, the scientific community is moving towards Deep Learning-based solutions for binary analysis. These solutions typically provide end-to-end capabilities for handling complex tasks and can alleviate the workload of human analysts. Among these tasks, Binary Function Similarity (BFS) detection is gaining more and more importance. This assesses whether two binary functions are compiled from the same source code. Despite the effort in proposing and systematizing DNN-based solutions for BFS, it is unclear what their resiliency would be against adversarial attacks. Indeed, a major drawback of DNN-based solutions is their sensitivity to adversarial attacks. This thesis investigates the robustness of Binary Function Similarity (BFS) systems against adversarial attacks, presenting two main contributions. First, we introduce black-box and white-box approaches to assess the resilience of BFS systems against adversarial attacks when comparing two functions directly. Our findings demonstrate that these systems are vulnerable to both targeted and untargeted attacks with respect to similarity objectives. We conduct extensive experiments on three state-of-the-art BFS solutions, revealing that they are more susceptible to black-box attacks than white-box ones while exhibiting greater resilience against targeted attacks. Second, we conduct a comprehensive evaluation of eight state-of-the-art BFS systems, assessing their resilience to adversarial attacks when used as search engines to retrieve functions from a given pool that are most similar to a certain query. Here, we propose a simple black-box method that alters both the topology and the content of the Control Flow Graph (CFG) of the attacked functions. Our findings reveal a critical insight: top performance on clean data does not necessarily correlate with superior robustness, underscoring the performance-robustness trade-offs that must be carefully considered when deploying such models.

Attacking binary function similarity systems / Capozzi, Gianluca. - (2025 May 30).

Attacking binary function similarity systems

CAPOZZI, GIANLUCA
30/05/2025

Abstract

The widespread diffusion of IoT devices and the growing availability of open-source software are amplifying the demand for autonomous solutions in the field of binary code analysis. As devices running software continue to increase, they generate various amounts of data that need to be analyzed for various purposes, such as security assessments, performance optimization, and code validation. This increasing volume of data, combined with the complexity of modern software and firmware, necessitates efficient tools and methodologies to assist researchers and analysts in binary code analysis. To address these challenges, the scientific community is moving towards Deep Learning-based solutions for binary analysis. These solutions typically provide end-to-end capabilities for handling complex tasks and can alleviate the workload of human analysts. Among these tasks, Binary Function Similarity (BFS) detection is gaining more and more importance. This assesses whether two binary functions are compiled from the same source code. Despite the effort in proposing and systematizing DNN-based solutions for BFS, it is unclear what their resiliency would be against adversarial attacks. Indeed, a major drawback of DNN-based solutions is their sensitivity to adversarial attacks. This thesis investigates the robustness of Binary Function Similarity (BFS) systems against adversarial attacks, presenting two main contributions. First, we introduce black-box and white-box approaches to assess the resilience of BFS systems against adversarial attacks when comparing two functions directly. Our findings demonstrate that these systems are vulnerable to both targeted and untargeted attacks with respect to similarity objectives. We conduct extensive experiments on three state-of-the-art BFS solutions, revealing that they are more susceptible to black-box attacks than white-box ones while exhibiting greater resilience against targeted attacks. Second, we conduct a comprehensive evaluation of eight state-of-the-art BFS systems, assessing their resilience to adversarial attacks when used as search engines to retrieve functions from a given pool that are most similar to a certain query. Here, we propose a simple black-box method that alters both the topology and the content of the Control Flow Graph (CFG) of the attacked functions. Our findings reveal a critical insight: top performance on clean data does not necessarily correlate with superior robustness, underscoring the performance-robustness trade-offs that must be carefully considered when deploying such models.
30-mag-2025
File allegati a questo prodotto
File Dimensione Formato  
Tesi_dottorato_Capozzi.pdf

accesso aperto

Note: Ph.D. dissertation thesis about Attacking Binary Function Similarity Systems
Tipologia: Tesi di dottorato
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 5.43 MB
Formato Adobe PDF
5.43 MB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1740580
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact