This paper presents DIS-PIPE, a software tool that leverages well-established process mining techniques to tackle the Data Pipeline Discovery (DPD) task. Data pipelines are composite steps that move data from disparate sources to some data consumers. While data travels through the pipeline, it can undergo various transformations processed by computational platforms. In this context, DPD targets learning the structure and behavior of a data pipeline from an event log that keeps track of its past executions, uncovering, to some extent, specific execution-related dark data whose knowledge is critical to improving the quality of pipeline modeling. DIS-PIPE has been designed, implemented, and validated in the H2020 European project DataCloud context, and is able to interpret XES logs enriched with information to capture the core concepts of data pipelines.
DIS-PIPE: A Tool for Data Pipeline Discovery / Agostinelli, S.; Benvenuti, D.; Marrella, A.; Rossi, J.. - 3783:(2024). (Intervento presentato al convegno International Conference on Process Mining tenutosi a Copenhagen; Denmark).
DIS-PIPE: A Tool for Data Pipeline Discovery
Agostinelli S.;Benvenuti D.;Marrella A.;Rossi J.
2024
Abstract
This paper presents DIS-PIPE, a software tool that leverages well-established process mining techniques to tackle the Data Pipeline Discovery (DPD) task. Data pipelines are composite steps that move data from disparate sources to some data consumers. While data travels through the pipeline, it can undergo various transformations processed by computational platforms. In this context, DPD targets learning the structure and behavior of a data pipeline from an event log that keeps track of its past executions, uncovering, to some extent, specific execution-related dark data whose knowledge is critical to improving the quality of pipeline modeling. DIS-PIPE has been designed, implemented, and validated in the H2020 European project DataCloud context, and is able to interpret XES logs enriched with information to capture the core concepts of data pipelines.File | Dimensione | Formato | |
---|---|---|---|
Agostinelli_DIS-PIPE_2024.pdf
accesso aperto
Note: https://ceur-ws.org/Vol-3783/paper_354.pdf
Tipologia:
Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza:
Creative commons
Dimensione
318.05 kB
Formato
Adobe PDF
|
318.05 kB | Adobe PDF |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.