The ability to perform predictive maintenance, as one of the main asset of Industry 4.0, is known to help improve downtime, costs, control and production quality. Modern predictive maintenance programs involve machine learning techniques, within the AI umbrella, that work in a data-driven fashion. This is true in all machinery where, through intelligent sensors, it is possible to collect data to be processed to detect faults or carry out anomaly detection activities. This paper presents a system for the detection of anomalies in the railway context and, specifically, in the pressurization systems of Italian high-speed trains. The available real-world dataset is in form of unlabeled time series of fixed length of 600 samples. Hence, it is proposed a two-stage machine learning workflow where the first stage acts in an unsupervised fashion through a statistical technique validated by field experts with the aim of building a labeled dataset. In the second stage, the faced problem is conceived as a classification task in the context of a strong class imbalance problem - very likely in predictive maintenance - where are compared two feature engineering techniques. The first one considers directly the raw signals as input of a SVM algorithm. In the second, time series are subjected to an adaptive heuristic procedure of piece-wise approximation, whose output is a sequence of $\mathbb{R}^{2}$ vectors (slopes and intercepts). In this case, the classification task is carried out in the so-called “dissimilarity space” for pattern recognition adopting different dimensions of the representation set obtained through a clustering algorithm. The dissimilarity measure consists of an ad-hoc edit distance capable of measuring the dissimilarity between 2-dimensional sequences. In this study a k-medoids clustering procedure is adopted for balancing the dataset together with further additional techniques for solving the challenging problem of unbalanced data, offering a deep comparison related to various experimental methodologies.

A statistical framework for labeling unlabelled data: a case study on anomaly detection in pressurization systems for high-speed railway trains / Santis, Enrico De; Arno, Francesco; Martino, Alessio; Rizzi, Antonello. - (2022), pp. 1-8. (Intervento presentato al convegno 2022 International Joint Conference on Neural Networks (IJCNN) tenutosi a Padova, Italy) [10.1109/IJCNN55064.2022.9892880].

A statistical framework for labeling unlabelled data: a case study on anomaly detection in pressurization systems for high-speed railway trains

Santis, Enrico De
;
Rizzi, Antonello
2022

Abstract

The ability to perform predictive maintenance, as one of the main asset of Industry 4.0, is known to help improve downtime, costs, control and production quality. Modern predictive maintenance programs involve machine learning techniques, within the AI umbrella, that work in a data-driven fashion. This is true in all machinery where, through intelligent sensors, it is possible to collect data to be processed to detect faults or carry out anomaly detection activities. This paper presents a system for the detection of anomalies in the railway context and, specifically, in the pressurization systems of Italian high-speed trains. The available real-world dataset is in form of unlabeled time series of fixed length of 600 samples. Hence, it is proposed a two-stage machine learning workflow where the first stage acts in an unsupervised fashion through a statistical technique validated by field experts with the aim of building a labeled dataset. In the second stage, the faced problem is conceived as a classification task in the context of a strong class imbalance problem - very likely in predictive maintenance - where are compared two feature engineering techniques. The first one considers directly the raw signals as input of a SVM algorithm. In the second, time series are subjected to an adaptive heuristic procedure of piece-wise approximation, whose output is a sequence of $\mathbb{R}^{2}$ vectors (slopes and intercepts). In this case, the classification task is carried out in the so-called “dissimilarity space” for pattern recognition adopting different dimensions of the representation set obtained through a clustering algorithm. The dissimilarity measure consists of an ad-hoc edit distance capable of measuring the dissimilarity between 2-dimensional sequences. In this study a k-medoids clustering procedure is adopted for balancing the dataset together with further additional techniques for solving the challenging problem of unbalanced data, offering a deep comparison related to various experimental methodologies.
2022
2022 International Joint Conference on Neural Networks (IJCNN)
predictive maintenance; dissimilarity space; unsupervised learning; supervised learning; anomaly detection; condition-based maintenance
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
A statistical framework for labeling unlabelled data: a case study on anomaly detection in pressurization systems for high-speed railway trains / Santis, Enrico De; Arno, Francesco; Martino, Alessio; Rizzi, Antonello. - (2022), pp. 1-8. (Intervento presentato al convegno 2022 International Joint Conference on Neural Networks (IJCNN) tenutosi a Padova, Italy) [10.1109/IJCNN55064.2022.9892880].
File allegati a questo prodotto
File Dimensione Formato  
DeSantis_A-statistical_2022.pdf

solo gestori archivio

Note: Articolo principale
Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 695.06 kB
Formato Adobe PDF
695.06 kB Adobe PDF   Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1657947
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? 1
social impact