State-of-the-art approaches for managing Big Data pipelines assume their anatomy is known by design and expressed through ad-hoc Domain-Specific Languages (DSLs), with insufficient knowledge of the dark data involved in the pipeline execution. Dark data is data that organizations acquire during regular business activities but is not used to derive insights or for decision-making. The recent literature on Big Data processing agrees that a new breed of Big Data pipeline discovery (BDPD) solutions can mitigate this issue by solely analyzing the event log that keeps track of pipeline executions over time. Relying on well-established process mining techniques, BDPD can reveal fact-based insights into how data pipelines transpire and access dark data. However, to date, a standard format to specify the concept of Big Data pipeline execution in an event log does not exist, making it challenging to apply process mining to achieve the BDPD task. To address this issue, in this paper we formalize a universally applicable reference data model to conceptualize the core properties and attributes of a data pipeline execution. We provide an implementation of the model as an extension to the XES interchange standard for event logs, demonstrate its practical applicability in a use case involving a data pipeline for managing digital marketing campaigns, and evaluate its effectiveness in uncovering dark data manipulated during several pipeline executions.

A Reference Data Model to Specify Event Logs for Big Data Pipeline Discovery / Benvenuti, D.; Marrella, A.; Rossi, J.; Nikolov, N.; Roman, D.; Soylu, A.; Perales, F.. - 490 LNBIP:(2023), pp. 38-54. (Intervento presentato al convegno International Conference in Business Process Management tenutosi a Utrecht) [10.1007/978-3-031-41623-1_3].

A Reference Data Model to Specify Event Logs for Big Data Pipeline Discovery

Benvenuti D.
Primo
;
Marrella A.
Secondo
;
Rossi J.
;
2023

Abstract

State-of-the-art approaches for managing Big Data pipelines assume their anatomy is known by design and expressed through ad-hoc Domain-Specific Languages (DSLs), with insufficient knowledge of the dark data involved in the pipeline execution. Dark data is data that organizations acquire during regular business activities but is not used to derive insights or for decision-making. The recent literature on Big Data processing agrees that a new breed of Big Data pipeline discovery (BDPD) solutions can mitigate this issue by solely analyzing the event log that keeps track of pipeline executions over time. Relying on well-established process mining techniques, BDPD can reveal fact-based insights into how data pipelines transpire and access dark data. However, to date, a standard format to specify the concept of Big Data pipeline execution in an event log does not exist, making it challenging to apply process mining to achieve the BDPD task. To address this issue, in this paper we formalize a universally applicable reference data model to conceptualize the core properties and attributes of a data pipeline execution. We provide an implementation of the model as an extension to the XES interchange standard for event logs, demonstrate its practical applicability in a use case involving a data pipeline for managing digital marketing campaigns, and evaluate its effectiveness in uncovering dark data manipulated during several pipeline executions.
2023
International Conference in Business Process Management
Big Data Pipeline Discovery (BDPD), Big Data Pipeline, Reference Data Model, Process Mining, Event Log, Dark Data, XES
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
A Reference Data Model to Specify Event Logs for Big Data Pipeline Discovery / Benvenuti, D.; Marrella, A.; Rossi, J.; Nikolov, N.; Roman, D.; Soylu, A.; Perales, F.. - 490 LNBIP:(2023), pp. 38-54. (Intervento presentato al convegno International Conference in Business Process Management tenutosi a Utrecht) [10.1007/978-3-031-41623-1_3].
File allegati a questo prodotto
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1697806
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact