Motivation and Rationale Traditional process mining techniques are based on a case-centric assumption, according to which each recorded event is associated with exactly one process instance. While this abstraction enables scalable analysis, it fails to accurately represent modern enterprise processes in which events often involve multiple interacting business objects. For example, a single payment event may simultaneously affect orders, customer accounts, and invoices, while a manufacturing step may involve materials, machines, operators, and quality records. Forcing such multi-object interactions into linear case-based traces introduces information loss, artificial convergence and divergence patterns, and obscures inter-object relationships, thereby limiting an organization's ability to accurately analyze, verify, and optimize their operational behavior. Object-Centric Process Mining (OCPM) addresses these limitations by enabling events to reference multiple business objects simultaneously. The Object-Centric Event Data (OCED) meta-model has emerged as a promising standardization, explicitly representing events, objects, lifecycles, and their relationships without forcing them into single-case structures. However, OCED's structural expressiveness alone is insufficient for rigorous process intelligence. Three critical gaps prevent its practical deployment in compliance-sensitive, data-driven environments. First, the OCED lacks a formal semantic infrastructure that defines machine-interpretable meaning for its concepts and relations. Current representations remain syntactic, hindering tool interoperability, automated reasoning, and systematic expression of domain-specific constraints. An insurance claim process encoded in one organization's OCED format cannot be automatically aligned with another's, despite structural similarity, because semantic mappings must otherwise be manually constructed. Second, existing conformance-checking and formal-verification approaches provide limited support for cross-object constraints and temporal dependencies across interacting business entities, precisely the characteristics that distinguish object-centric processes from traditional workflows. A compliance rule stating every order containing hazardous materials must have an approved safety certificate before shipment spans three object types and requires temporal ordering, yet current techniques cannot systematically verify such constraints across large event logs with formal correctness guarantees. Third, organizations possess vast repositories of historical event data in traditional formats such as XES, but lack a systematic methodology to transform these logs into semantically enriched, object-centric representations that preserve implicit relationships while enabling advanced analysis. These gaps form a circular dependency: semantic enrichment without verification cannot guarantee correctness; verification without transformation cannot leverage existing data; transformation without semantics produces data lacking interpretable meaning. Breaking this cycle requires an integrated framework unifying object-centric data modeling, semantic web technologies, and formal verification within a single end-to-end pipeline, precisely the contribution of this thesis. Process analysts, compliance officers, and business intelligence professionals currently navigate heterogeneous data sources manually, lack formal guarantees about analysis correctness, and struggle to express complex multi-object constraints in existing tools. A system that transforms heterogeneous event data into verified, semantically enriched knowledge graphs would drastically reduce setup time through automated pipelines while providing formal assurances that compliance checks actually verify what they claim to verify. This forms the purpose of our approach, which constitutes the core of this thesis. Our investigation resides at the intersection of Process Mining, Semantic Web Technologies, and Formal Methods. We adopted OWL ontologies and SPARQL to provide formal semantic foundations for OCED. We developed Alloy-based specifications enabling automated verification of structural and temporal constraints across interacting objects. We developed systematic transformation methodologies that bridge traditional event logs with semantically enriched representations. From an abstract perspective, this work addresses the discovery of formally verified, semantically explicit process intelligence from heterogeneous, multi-object event data in complex business environments where compliance requirements demand both expressive analysis and correctness guarantees. Research Contributions During the development of our approach, we contributed to the research fields of Object-Centric Process Mining, Semantic Web Technologies, and Formal Verification. Here, we briefly summarize our research contributions in these areas. RC1 We developed SemOCED (Semantic Object-Centric Event Data), a three-layered semantic framework providing formal ontological foundations for OCED. The framework consists of a meta-level ontology capturing core OCED concepts, a domain-specific layer enabling context-dependent extensions, and an instance-level representation supporting RDF-based knowledge graph construction. This layered architecture enables interoperable OCED representation across tools and organizations while supporting automated reasoning and inference through standard semantic web technologies. We introduced systematic transformation methodologies for converting process data into semantically enriched representations. These methodologies preserve implicit relationships between business objects that exist in flat event logs but are obscured by case-centric structuring. We demonstrate how object-centric reinterpretation reveals, in a semantic manner, process patterns invisible in traditional analysis. RC2 We designed expressive SPARQL query patterns enabling advanced object-centric process analytics over semantically enriched event knowledge graphs. These patterns support cross-object analytical perspectives, temporal reasoning about object lifecycles, and integration with business intelligence tasks. The query framework QuOCED demonstrates how semantic enrichment extends process mining capabilities beyond traditional discovery and conformance checking toward data-driven intelligence and automated decision support. RC3 We proposed FOCED (Formal Object-Centric Event Data), an Alloy-based formal specification framework for modeling and verifying structural and temporal constraints across multiple interacting business objects. FOCED translates process constraints into formal specifications, enabling automated model checking with formal correctness guarantees. We demonstrate the framework's effectiveness in discovering and validating implicit object dependencies in event logs, particularly during data import into graph databases such as Neo4j. Through formal verification, FOCED identifies structural inconsistencies that would otherwise lead to data invisibility, a critical issue that compromises knowledge graph integrity. This approach ensures correct knowledge graph creation, enrichment, and querying by catching constraint violations before they propagate into downstream systems. RC4 We integrated semantic specification, transformation, and querying into a unified end-to-end pipeline for SemOCED and FOCED. This pipeline systematically transforms heterogeneous event data sources into OCED representations, enriches them with domain-specific semantics through OCEDO, and constructs RDF and property graph knowledge graphs of SemOCED. Verified and semantically enriched models produced by this pipeline can be consumed by downstream intelligent systems, enabling reliable automation and data-driven decision support in complex business environments for FOCED. Parts of our work were published in the following papers. Each is labeled with the research challenges addressed. [1] S. Latif, F. J. Ekaputra, M. Vidgof, S. Kirrane, and C. Di Ciccio, “A Semantic Encoding of Object-Centric Event Data,” in Mining a Scientist’s Process (J. Mendling et al., eds.), vol. 16480 of Lecture Notes in Computer Science, pp. 1–14, Springer(to appear), arXiv preprint arXiv:2511.03351, 2026. Research Contributions RC1, RC4 [2] S. Latif, H. Latif, and M. R. U. Rahman, “Object-Centric Analysis of XES Event Logs: Integrating OCED Modeling with SPARQL Queries,” in Interna- tional Workshop on Analytics for Software Product and Process Improvement (A-SPPI), Product-Focused Software Process Improvement. Industry, Workshop Paper (G. Scanniello, V. Lenarduzzi, S. Romano, S. Vegas, and R. Francese,eds.), Cham, pp. 171–182, Springer Nature Switzerland, 2026. Research Contributions RC2, RC4. [3] S. Latif, H. Latif, T. U. Rehman, and M. R. U. Rahman, “Alloy-Driven Verification of Object-Centric Event Data: From Temporal Logic to Knowledge Graphs,” in 1st International Workshop on Data-Centric Artificial Intelligence for Intelligent Agents (DCAI4IA), International Conference on Service-Oriented Computing (ICSOC) 2025, Shenzhen, China, December 1–4, (To appear), arXiv preprint arXiv:2511.07263, 2025. Research Contributions RC3, RC4. Thesis Outline Chapter 1 introduces this work and provides deeper insights into the limitations of case-centric process mining, the emergence of object-centric approaches, and the motivation for integrating semantic enrichment with formal verification. Chapter 2 analyzes the current state of the art in the areas of (i) Process Mining, with emphasis on object-centric techniques and the evolution from case-centric to object-centric paradigm, (ii) the OCED meta-model and its standardization efforts (iii) Semantic Web Technologies, focusing on ontology engineering, knowledge graph construction and (iv) Formal Methods, temporal logic specifications for process compliance. Chapter 3 presents the SemOCED framework, detailing the three-layered semantic ontology architecture and the mapping rules that transform OCED instances into RDF knowledge graphs. There, the definitions of meta-level, domain-specific, and instance-level ontologies are provided, together with concrete examples demonstrating how semantic enrichment enables expressive SPARQL queries for process analysis. We illustrate the application of SemOCED to representative OCED datasets and demonstrate how the semantic layer supports automated reasoning and interoperability. Chapter 4 extends the semantic foundation by introducing systematic methodologies for transforming traditional XES event logs into SemOCED-compliant representations. We detail the algorithms for extracting implicit object relationships from flat event logs, constructing object lifecycles, and preserving temporal dependencies during transformation. The chapter presents this approach on the BPIC 2013 benchmark, showing how object-centric reinterpretation, combined with SPARQL querying QuOCED, reveals cross-object patterns, bottlenecks involving multiple business entities, and compliance violations spanning object interactions that remain invisible in traditional case-centric analysis. Chapter 5 introduces the FOCED framework for formal verification of object-centric event data. We present Alloy-based specifications that model structural constraints, mandatory object relationships, attribute value restrictions, event-object configurations, and temporal constraints across multiple interacting business objects. The chapter demonstrates how FOCED enables automated model checking with formal correctness guarantees, thereby detecting violations that escape traditional conformance-checking techniques. We also detail the integration of FOCED verification results into the knowledge graph representation, creating a unified view where compliance status is explicitly encoded and queryable. Chapter 6 synthesizes the preceding contributions into integrated end-to-end pipelines of SemOCED and FOCED. We depict the modular architecture of the framework, abstracted as high-level components: (i) data ingestion from heterogeneous sources (XES, OCEL, CSV, database exports), (ii) OCED transformation preserving object relationships, (iii) semantic enrichment through OCEDO, (iv) knowledge graph construction supporting both RDF (via SemOCED ontology) and property graph paradigms. We demonstrate the complete pipeline in operation across real-world case studies, showing how verified, semantically enriched process intelligence supports downstream applications in business intelligence, compliance monitoring, and automated decision support for SemOCED and FOCED. Finally, Chapter 7 concludes the discussion and traces future developments that might arise from the foundation of this work, including scalability improvements for large-scale OCED deployments, extensions to additional process perspectives (resource, cost, decision data), integration with process-intelligence platforms and process-aware information systems, and exploration of machine learning techniques operating over verified, semantically enriched knowledge graphs for predictive and prescriptive process analytics. Additional Work The author of this thesis has also been involved in research on formal methods for blockchain smart contracts during the later stages of the Ph.D. program. This work explored how formal verification techniques can improve the correctness and security of smart contracts in distributed ledger environments. We report it here for completeness, though the themes covered in its subject are out of scope for this thesis. [4] S. Latif, “Formal Methods as a Catalyst for Robust Smart Contracts in Blockchain”, in Proc. 2025 Int. Conf. Emerging Technologies in Electronics, 2025.

Semantically representing, querying, and checking object-centric event data / Latif, Saba. - (2026 May 11).

Semantically representing, querying, and checking object-centric event data

LATIF, SABA
11/05/2026

Abstract

Motivation and Rationale Traditional process mining techniques are based on a case-centric assumption, according to which each recorded event is associated with exactly one process instance. While this abstraction enables scalable analysis, it fails to accurately represent modern enterprise processes in which events often involve multiple interacting business objects. For example, a single payment event may simultaneously affect orders, customer accounts, and invoices, while a manufacturing step may involve materials, machines, operators, and quality records. Forcing such multi-object interactions into linear case-based traces introduces information loss, artificial convergence and divergence patterns, and obscures inter-object relationships, thereby limiting an organization's ability to accurately analyze, verify, and optimize their operational behavior. Object-Centric Process Mining (OCPM) addresses these limitations by enabling events to reference multiple business objects simultaneously. The Object-Centric Event Data (OCED) meta-model has emerged as a promising standardization, explicitly representing events, objects, lifecycles, and their relationships without forcing them into single-case structures. However, OCED's structural expressiveness alone is insufficient for rigorous process intelligence. Three critical gaps prevent its practical deployment in compliance-sensitive, data-driven environments. First, the OCED lacks a formal semantic infrastructure that defines machine-interpretable meaning for its concepts and relations. Current representations remain syntactic, hindering tool interoperability, automated reasoning, and systematic expression of domain-specific constraints. An insurance claim process encoded in one organization's OCED format cannot be automatically aligned with another's, despite structural similarity, because semantic mappings must otherwise be manually constructed. Second, existing conformance-checking and formal-verification approaches provide limited support for cross-object constraints and temporal dependencies across interacting business entities, precisely the characteristics that distinguish object-centric processes from traditional workflows. A compliance rule stating every order containing hazardous materials must have an approved safety certificate before shipment spans three object types and requires temporal ordering, yet current techniques cannot systematically verify such constraints across large event logs with formal correctness guarantees. Third, organizations possess vast repositories of historical event data in traditional formats such as XES, but lack a systematic methodology to transform these logs into semantically enriched, object-centric representations that preserve implicit relationships while enabling advanced analysis. These gaps form a circular dependency: semantic enrichment without verification cannot guarantee correctness; verification without transformation cannot leverage existing data; transformation without semantics produces data lacking interpretable meaning. Breaking this cycle requires an integrated framework unifying object-centric data modeling, semantic web technologies, and formal verification within a single end-to-end pipeline, precisely the contribution of this thesis. Process analysts, compliance officers, and business intelligence professionals currently navigate heterogeneous data sources manually, lack formal guarantees about analysis correctness, and struggle to express complex multi-object constraints in existing tools. A system that transforms heterogeneous event data into verified, semantically enriched knowledge graphs would drastically reduce setup time through automated pipelines while providing formal assurances that compliance checks actually verify what they claim to verify. This forms the purpose of our approach, which constitutes the core of this thesis. Our investigation resides at the intersection of Process Mining, Semantic Web Technologies, and Formal Methods. We adopted OWL ontologies and SPARQL to provide formal semantic foundations for OCED. We developed Alloy-based specifications enabling automated verification of structural and temporal constraints across interacting objects. We developed systematic transformation methodologies that bridge traditional event logs with semantically enriched representations. From an abstract perspective, this work addresses the discovery of formally verified, semantically explicit process intelligence from heterogeneous, multi-object event data in complex business environments where compliance requirements demand both expressive analysis and correctness guarantees. Research Contributions During the development of our approach, we contributed to the research fields of Object-Centric Process Mining, Semantic Web Technologies, and Formal Verification. Here, we briefly summarize our research contributions in these areas. RC1 We developed SemOCED (Semantic Object-Centric Event Data), a three-layered semantic framework providing formal ontological foundations for OCED. The framework consists of a meta-level ontology capturing core OCED concepts, a domain-specific layer enabling context-dependent extensions, and an instance-level representation supporting RDF-based knowledge graph construction. This layered architecture enables interoperable OCED representation across tools and organizations while supporting automated reasoning and inference through standard semantic web technologies. We introduced systematic transformation methodologies for converting process data into semantically enriched representations. These methodologies preserve implicit relationships between business objects that exist in flat event logs but are obscured by case-centric structuring. We demonstrate how object-centric reinterpretation reveals, in a semantic manner, process patterns invisible in traditional analysis. RC2 We designed expressive SPARQL query patterns enabling advanced object-centric process analytics over semantically enriched event knowledge graphs. These patterns support cross-object analytical perspectives, temporal reasoning about object lifecycles, and integration with business intelligence tasks. The query framework QuOCED demonstrates how semantic enrichment extends process mining capabilities beyond traditional discovery and conformance checking toward data-driven intelligence and automated decision support. RC3 We proposed FOCED (Formal Object-Centric Event Data), an Alloy-based formal specification framework for modeling and verifying structural and temporal constraints across multiple interacting business objects. FOCED translates process constraints into formal specifications, enabling automated model checking with formal correctness guarantees. We demonstrate the framework's effectiveness in discovering and validating implicit object dependencies in event logs, particularly during data import into graph databases such as Neo4j. Through formal verification, FOCED identifies structural inconsistencies that would otherwise lead to data invisibility, a critical issue that compromises knowledge graph integrity. This approach ensures correct knowledge graph creation, enrichment, and querying by catching constraint violations before they propagate into downstream systems. RC4 We integrated semantic specification, transformation, and querying into a unified end-to-end pipeline for SemOCED and FOCED. This pipeline systematically transforms heterogeneous event data sources into OCED representations, enriches them with domain-specific semantics through OCEDO, and constructs RDF and property graph knowledge graphs of SemOCED. Verified and semantically enriched models produced by this pipeline can be consumed by downstream intelligent systems, enabling reliable automation and data-driven decision support in complex business environments for FOCED. Parts of our work were published in the following papers. Each is labeled with the research challenges addressed. [1] S. Latif, F. J. Ekaputra, M. Vidgof, S. Kirrane, and C. Di Ciccio, “A Semantic Encoding of Object-Centric Event Data,” in Mining a Scientist’s Process (J. Mendling et al., eds.), vol. 16480 of Lecture Notes in Computer Science, pp. 1–14, Springer(to appear), arXiv preprint arXiv:2511.03351, 2026. Research Contributions RC1, RC4 [2] S. Latif, H. Latif, and M. R. U. Rahman, “Object-Centric Analysis of XES Event Logs: Integrating OCED Modeling with SPARQL Queries,” in Interna- tional Workshop on Analytics for Software Product and Process Improvement (A-SPPI), Product-Focused Software Process Improvement. Industry, Workshop Paper (G. Scanniello, V. Lenarduzzi, S. Romano, S. Vegas, and R. Francese,eds.), Cham, pp. 171–182, Springer Nature Switzerland, 2026. Research Contributions RC2, RC4. [3] S. Latif, H. Latif, T. U. Rehman, and M. R. U. Rahman, “Alloy-Driven Verification of Object-Centric Event Data: From Temporal Logic to Knowledge Graphs,” in 1st International Workshop on Data-Centric Artificial Intelligence for Intelligent Agents (DCAI4IA), International Conference on Service-Oriented Computing (ICSOC) 2025, Shenzhen, China, December 1–4, (To appear), arXiv preprint arXiv:2511.07263, 2025. Research Contributions RC3, RC4. Thesis Outline Chapter 1 introduces this work and provides deeper insights into the limitations of case-centric process mining, the emergence of object-centric approaches, and the motivation for integrating semantic enrichment with formal verification. Chapter 2 analyzes the current state of the art in the areas of (i) Process Mining, with emphasis on object-centric techniques and the evolution from case-centric to object-centric paradigm, (ii) the OCED meta-model and its standardization efforts (iii) Semantic Web Technologies, focusing on ontology engineering, knowledge graph construction and (iv) Formal Methods, temporal logic specifications for process compliance. Chapter 3 presents the SemOCED framework, detailing the three-layered semantic ontology architecture and the mapping rules that transform OCED instances into RDF knowledge graphs. There, the definitions of meta-level, domain-specific, and instance-level ontologies are provided, together with concrete examples demonstrating how semantic enrichment enables expressive SPARQL queries for process analysis. We illustrate the application of SemOCED to representative OCED datasets and demonstrate how the semantic layer supports automated reasoning and interoperability. Chapter 4 extends the semantic foundation by introducing systematic methodologies for transforming traditional XES event logs into SemOCED-compliant representations. We detail the algorithms for extracting implicit object relationships from flat event logs, constructing object lifecycles, and preserving temporal dependencies during transformation. The chapter presents this approach on the BPIC 2013 benchmark, showing how object-centric reinterpretation, combined with SPARQL querying QuOCED, reveals cross-object patterns, bottlenecks involving multiple business entities, and compliance violations spanning object interactions that remain invisible in traditional case-centric analysis. Chapter 5 introduces the FOCED framework for formal verification of object-centric event data. We present Alloy-based specifications that model structural constraints, mandatory object relationships, attribute value restrictions, event-object configurations, and temporal constraints across multiple interacting business objects. The chapter demonstrates how FOCED enables automated model checking with formal correctness guarantees, thereby detecting violations that escape traditional conformance-checking techniques. We also detail the integration of FOCED verification results into the knowledge graph representation, creating a unified view where compliance status is explicitly encoded and queryable. Chapter 6 synthesizes the preceding contributions into integrated end-to-end pipelines of SemOCED and FOCED. We depict the modular architecture of the framework, abstracted as high-level components: (i) data ingestion from heterogeneous sources (XES, OCEL, CSV, database exports), (ii) OCED transformation preserving object relationships, (iii) semantic enrichment through OCEDO, (iv) knowledge graph construction supporting both RDF (via SemOCED ontology) and property graph paradigms. We demonstrate the complete pipeline in operation across real-world case studies, showing how verified, semantically enriched process intelligence supports downstream applications in business intelligence, compliance monitoring, and automated decision support for SemOCED and FOCED. Finally, Chapter 7 concludes the discussion and traces future developments that might arise from the foundation of this work, including scalability improvements for large-scale OCED deployments, extensions to additional process perspectives (resource, cost, decision data), integration with process-intelligence platforms and process-aware information systems, and exploration of machine learning techniques operating over verified, semantically enriched knowledge graphs for predictive and prescriptive process analytics. Additional Work The author of this thesis has also been involved in research on formal methods for blockchain smart contracts during the later stages of the Ph.D. program. This work explored how formal verification techniques can improve the correctness and security of smart contracts in distributed ledger environments. We report it here for completeness, though the themes covered in its subject are out of scope for this thesis. [4] S. Latif, “Formal Methods as a Catalyst for Robust Smart Contracts in Blockchain”, in Proc. 2025 Int. Conf. Emerging Technologies in Electronics, 2025.
11-mag-2026
File allegati a questo prodotto
File Dimensione Formato  
Tesi_dottorato_Latif.pdf

accesso aperto

Note: tesi completa
Tipologia: Tesi di dottorato
Licenza: Creative commons
Dimensione 5.94 MB
Formato Adobe PDF
5.94 MB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1768205
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact