Data augmentation is a fundamental technique in machine learning to enhance model generalization by artificially expanding training datasets. However, conventional augmentation approaches often rely on heuristic transformations that may not fully capture domain-specific knowledge. This position paper advocates a data-centric AI perspective on data augmentation, emphasizing the integration of semantic technologies, particularly domain ontologies, to guide augmentation strategies. The use of techniques from Symbolic AI for data augmentation has been dealt with only in a few recent papers. Our goal is to explore further this idea, based on the consideration that an explicit representation of the domain may be helpful in two key tasks: optimizing the generation of new data, and validating the generated data, both fundamental steps for all data augmentation strategies. We aim at developing novel approaches that combine ontologies and data augmentation techniques to address these two tasks, in particular by relying on automated reasoning. We argue that leveraging knowledge representation and symbolic reasoning enables more principled and context-aware data augmentation, leading to improved model robustness and fairness.

Data Augmentation for Data-Centric AI Through the Lens of Semantic Technologies: A Position Paper / Cabibbo, Luca; Bertillo, Daniele; Cima, Gianluca; Crescenzi, Valter; Console, Marco; Delfino, Roberto Maria; Iannucci, Stefano; Lembo, Domenico; Lenzerini, Maurizio; Marconi, Lorenzo; Merialdo, Paolo; Napoleone, Marco; Papi, Laura; Poggi, Antonella; Scafoglieri, Federico; Torlone, Riccardo. - 4182:(2026). ( Symposium on Advanced Database Systems Ischia; Italy ).

Data Augmentation for Data-Centric AI Through the Lens of Semantic Technologies: A Position Paper

Gianluca Cima;Valter Crescenzi;Marco Console;Roberto Maria Delfino;Stefano Iannucci;Domenico Lembo;Maurizio Lenzerini;Lorenzo Marconi;Laura Papi;Antonella Poggi;Federico Scafoglieri;
2026

Abstract

Data augmentation is a fundamental technique in machine learning to enhance model generalization by artificially expanding training datasets. However, conventional augmentation approaches often rely on heuristic transformations that may not fully capture domain-specific knowledge. This position paper advocates a data-centric AI perspective on data augmentation, emphasizing the integration of semantic technologies, particularly domain ontologies, to guide augmentation strategies. The use of techniques from Symbolic AI for data augmentation has been dealt with only in a few recent papers. Our goal is to explore further this idea, based on the consideration that an explicit representation of the domain may be helpful in two key tasks: optimizing the generation of new data, and validating the generated data, both fundamental steps for all data augmentation strategies. We aim at developing novel approaches that combine ontologies and data augmentation techniques to address these two tasks, in particular by relying on automated reasoning. We argue that leveraging knowledge representation and symbolic reasoning enables more principled and context-aware data augmentation, leading to improved model robustness and fairness.
2026
Symposium on Advanced Database Systems
data augmentation; ontology based data access; semantic technologies; data centric artificial intelligence
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
Data Augmentation for Data-Centric AI Through the Lens of Semantic Technologies: A Position Paper / Cabibbo, Luca; Bertillo, Daniele; Cima, Gianluca; Crescenzi, Valter; Console, Marco; Delfino, Roberto Maria; Iannucci, Stefano; Lembo, Domenico; Lenzerini, Maurizio; Marconi, Lorenzo; Merialdo, Paolo; Napoleone, Marco; Papi, Laura; Poggi, Antonella; Scafoglieri, Federico; Torlone, Riccardo. - 4182:(2026). ( Symposium on Advanced Database Systems Ischia; Italy ).
File allegati a questo prodotto
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1766838
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact