Catalogo dei prodotti della ricerca

Modern endpoint threat detection must operate under constraints that are rarely acknowledged in academic evaluations: tolerable false positive rates of 10⁻³ or lower, strict latency and compute budgets, robustness against adaptive adversaries, and explainable outputs to support analyst triage. This thesis develops machine-learning methods for host-level threat detection that are designed from the outset to satisfy these constraints, organized around the dichotomy between file-based and file-less malicious execution on Windows and Linux endpoints. For file-based threats, we introduce Nebula, a self-attention transformer architecture for dynamic analysis of Windows Portable Executable malware. Nebula ingests heterogeneous behavioral telemetry — API calls, filesystem operations, network communications, and registry modifications — and addresses three challenges of behavioral analysis: cross-modal correlation, vocabulary explosion in machine-generated reports, and long-range temporal dependencies. Domain-specific tokenization controls vocabulary entropy while preserving behavioral semantics, and a self-supervised pre-training phase improves data efficiency. On three independent dynamic-analysis corpora, Nebula achieves useful true-positive rates at FPR ≤ 10⁻³ and remains resilient against transfer attacks crafted from static adversarial EXEmples. For file-less threats, we introduce QuasarNix, a template-based data synthesis framework for detecting Living-off-the-Land (LOTL) reverse shells on Linux. LOTL detection suffers from severe class imbalance, since malicious samples are orders of magnitude rarer than legitimate command-line activity, while the abuse of trusted dual-use binaries renders signature-based approaches impractical. Rather than relying on unconstrained generative models that risk syntactic infidelity, QuasarNix decomposes known attack patterns into reusable templates that guarantee functional validity while enabling combinatorial diversity. Combined with adversarial perturbation training, the resulting classifier operates at FPR ≤ 10⁻⁶ — a threshold dictated by the volume of live endpoint telemetry — while remaining robust to a taxonomy of obfuscation-based evasions. Both systems are accompanied by explainability diagnostics (integrated gradients and attention analysis for Nebula; SHAP-based feature attribution for QuasarNix) and evaluated on temporally separated splits to assess short-term generalization. The thesis concludes with cross-system design patterns and deployment considerations — including sequential pipeline orchestration and regression-free model updates — distilled from both case studies.

Behavioral machine learning methods for adversarially robust threat detection at operational constraints / Trizna, D.. - (2026 May 18).

Behavioral machine learning methods for adversarially robust threat detection at operational constraints

TRIZNA, DMITRIJS

18/05/2026

Abstract

Modern endpoint threat detection must operate under constraints that are rarely acknowledged in academic evaluations: tolerable false positive rates of 10⁻³ or lower, strict latency and compute budgets, robustness against adaptive adversaries, and explainable outputs to support analyst triage. This thesis develops machine-learning methods for host-level threat detection that are designed from the outset to satisfy these constraints, organized around the dichotomy between file-based and file-less malicious execution on Windows and Linux endpoints. For file-based threats, we introduce Nebula, a self-attention transformer architecture for dynamic analysis of Windows Portable Executable malware. Nebula ingests heterogeneous behavioral telemetry — API calls, filesystem operations, network communications, and registry modifications — and addresses three challenges of behavioral analysis: cross-modal correlation, vocabulary explosion in machine-generated reports, and long-range temporal dependencies. Domain-specific tokenization controls vocabulary entropy while preserving behavioral semantics, and a self-supervised pre-training phase improves data efficiency. On three independent dynamic-analysis corpora, Nebula achieves useful true-positive rates at FPR ≤ 10⁻³ and remains resilient against transfer attacks crafted from static adversarial EXEmples. For file-less threats, we introduce QuasarNix, a template-based data synthesis framework for detecting Living-off-the-Land (LOTL) reverse shells on Linux. LOTL detection suffers from severe class imbalance, since malicious samples are orders of magnitude rarer than legitimate command-line activity, while the abuse of trusted dual-use binaries renders signature-based approaches impractical. Rather than relying on unconstrained generative models that risk syntactic infidelity, QuasarNix decomposes known attack patterns into reusable templates that guarantee functional validity while enabling combinatorial diversity. Combined with adversarial perturbation training, the resulting classifier operates at FPR ≤ 10⁻⁶ — a threshold dictated by the volume of live endpoint telemetry — while remaining robust to a taxonomy of obfuscation-based evasions. Both systems are accompanied by explainability diagnostics (integrated gradients and attention analysis for Nebula; SHAP-based feature attribution for QuasarNix) and evaluated on temporally separated splits to assess short-term generalization. The thesis concludes with cross-system design patterns and deployment considerations — including sequential pipeline orchestration and regression-free model updates — distilled from both case studies.

Scheda breve

Scheda completa

	Data di discussione
	
				18-mag-2026
			
	Tutor esterni
	
				Roli, Fabio; Biggio, Battista
			
	Appartiene alla tipologia:
	
				07a Tesi di Dottorato

File allegati a questo prodotto

File	Dimensione	Formato
Tesi_dottorato_Trizna.pdf accesso aperto Note: tesi completa Tipologia: Tesi di dottorato Licenza: Creative commons Dimensione 4.16 MB Formato Adobe PDF	4.16 MB	Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1768853

Citazioni

ND

ND

ND

social impact