Data augmentation is a fundamental technique in machine learning to enhance model generalization by artificially expanding training datasets. However, conventional augmentation approaches often rely on heuristic transformations that may not fully capture domain-specific knowledge. This position paper advocates a data-centric AI perspective on data augmentation, emphasizing the integration of semantic technologies, particularly domain ontologies, to guide augmentation strategies. The use of techniques from Symbolic AI for data augmentation has been dealt with only in a few recent papers. Our goal is to explore further this idea, based on the consideration that an explicit representation of the domain may be helpful in two key tasks: optimizing the generation of new data, and validating the generated data, both fundamental steps for all data augmentation strategies. We aim at developing novel approaches that combine ontologies and data augmentation techniques to address these two tasks, in particular by relying on automated reasoning. We argue that leveraging knowledge representation and symbolic reasoning enables more principled and context-aware data augmentation, leading to improved model robustness and fairness.

Data Augmentation for Data-Centric AI Through the Lens of Semantic Technologies: A Position Paper / Cabibbo, Luca; Bertillo, Daniele; Cima, Gianluca; Crescenzi, Valter; Console, Marco; Delfino, Roberto Maria; Iannucci, Stefano; Lembo, Domenico; Lenzerini, Maurizio; Marconi, Lorenzo; Merialdo, Paolo; Napoleone, Marco; Papi, Laura; Poggi, Antonella; Scafoglieri, Federico; Torlone, Riccardo. - 4182:(2025), pp. 56-66. ( 33rd Italian Symposium on Advanced Database Systems, SEBD 2025 Ischia; Italy ).

Data Augmentation for Data-Centric AI Through the Lens of Semantic Technologies: A Position Paper

Gianluca Cima
;
Valter Crescenzi;Marco Console;Roberto Maria Delfino;Stefano Iannucci;Domenico Lembo;Maurizio Lenzerini;Lorenzo Marconi;Laura Papi;Antonella Poggi;Federico Scafoglieri;
2025

Abstract

Data augmentation is a fundamental technique in machine learning to enhance model generalization by artificially expanding training datasets. However, conventional augmentation approaches often rely on heuristic transformations that may not fully capture domain-specific knowledge. This position paper advocates a data-centric AI perspective on data augmentation, emphasizing the integration of semantic technologies, particularly domain ontologies, to guide augmentation strategies. The use of techniques from Symbolic AI for data augmentation has been dealt with only in a few recent papers. Our goal is to explore further this idea, based on the consideration that an explicit representation of the domain may be helpful in two key tasks: optimizing the generation of new data, and validating the generated data, both fundamental steps for all data augmentation strategies. We aim at developing novel approaches that combine ontologies and data augmentation techniques to address these two tasks, in particular by relying on automated reasoning. We argue that leveraging knowledge representation and symbolic reasoning enables more principled and context-aware data augmentation, leading to improved model robustness and fairness.
2025
33rd Italian Symposium on Advanced Database Systems, SEBD 2025
data augmentation; ontology based data access; semantic technologies; data centric artificial intelligence
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
Data Augmentation for Data-Centric AI Through the Lens of Semantic Technologies: A Position Paper / Cabibbo, Luca; Bertillo, Daniele; Cima, Gianluca; Crescenzi, Valter; Console, Marco; Delfino, Roberto Maria; Iannucci, Stefano; Lembo, Domenico; Lenzerini, Maurizio; Marconi, Lorenzo; Merialdo, Paolo; Napoleone, Marco; Papi, Laura; Poggi, Antonella; Scafoglieri, Federico; Torlone, Riccardo. - 4182:(2025), pp. 56-66. ( 33rd Italian Symposium on Advanced Database Systems, SEBD 2025 Ischia; Italy ).
File allegati a questo prodotto
File Dimensione Formato  
Bertillo_Data-Augmentation_2025.pdf

accesso aperto

Note: https://ceur-ws.org/Vol-4182/paper46.pdf
Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Creative commons
Dimensione 933.85 kB
Formato Adobe PDF
933.85 kB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1766838
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact