This thesis introduces a novel approach to online failure prediction for mission critical distributed systems that has the distinctive features to be black-box, non-intrusive and online. The approach combines Complex Event Processing (CEP) and Hidden Markov Models (HMM) so as to analyze symptoms of failures that might occur in the form of anomalous conditions of performance metrics identified for such purpose. The thesis presents an architecture named CASPER, based on CEP and HMM, that relies on sniffed information from the communication network of a mission critical system, only, for predicting anomalies that can lead to software failures. An instance of Casper has been implemented, trained and tuned to monitor a real Air Traffic Control (ATC) system developed by Selex ES, a Finmeccanica Company. An extensive experimental evaluation of CASPER is presented. The obtained results show (i) a very low percentage of false positives over both normal and under stress conditions, and (ii) a sufficiently high failure prediction time that allows the system to apply appropriate recovery procedures.

Online failure prediction in air traffic control systems / Montanari, Luca. - (2013 Mar 27).

Online failure prediction in air traffic control systems

Montanari, Luca
27/03/2013

Abstract

This thesis introduces a novel approach to online failure prediction for mission critical distributed systems that has the distinctive features to be black-box, non-intrusive and online. The approach combines Complex Event Processing (CEP) and Hidden Markov Models (HMM) so as to analyze symptoms of failures that might occur in the form of anomalous conditions of performance metrics identified for such purpose. The thesis presents an architecture named CASPER, based on CEP and HMM, that relies on sniffed information from the communication network of a mission critical system, only, for predicting anomalies that can lead to software failures. An instance of Casper has been implemented, trained and tuned to monitor a real Air Traffic Control (ATC) system developed by Selex ES, a Finmeccanica Company. An extensive experimental evaluation of CASPER is presented. The obtained results show (i) a very low percentage of false positives over both normal and under stress conditions, and (ii) a sufficiently high failure prediction time that allows the system to apply appropriate recovery procedures.
27-mar-2013
File allegati a questo prodotto
File Dimensione Formato  
LucaMontanariPhDthesis.pdf

accesso aperto

Note: Tesi di dottorato
Tipologia: Tesi di dottorato
Licenza: Creative commons
Dimensione 4.42 MB
Formato Adobe PDF
4.42 MB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/917211
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact