Data augmentation is a fundamental technique in machine learning to enhance model generalization by artificially expanding training datasets. However, conventional augmentation approaches often rely on heuristic transformations that may not fully capture domain-specific knowledge. This position paper advocates a data-centric AI perspective on data augmentation, emphasizing the integration of semantic technologies, particularly domain ontologies, to guide augmentation strategies. The use of techniques from Symbolic AI for data augmentation has been dealt with only in a few recent papers. Our goal is to explore further this idea, based on the consideration that an explicit representation of the domain may be helpful in two key tasks: optimizing the generation of new data, and validating the generated data, both fundamental steps for all data augmentation strategies. We aim at developing novel approaches that combine ontologies and data augmentation techniques to address these two tasks, in particular by relying on automated reasoning. We argue that leveraging knowledge representation and symbolic reasoning enables more principled and context-aware data augmentation, leading to improved model robustness and fairness.
Data Augmentation for Data-Centric AI Through the Lens of Semantic Technologies: A Position Paper / Cabibbo, Luca; Bertillo, Daniele; Cima, Gianluca; Crescenzi, Valter; Console, Marco; Delfino, Roberto Maria; Iannucci, Stefano; Lembo, Domenico; Lenzerini, Maurizio; Marconi, Lorenzo; Merialdo, Paolo; Napoleone, Marco; Papi, Laura; Poggi, Antonella; Scafoglieri, Federico; Torlone, Riccardo. - 4182:(2025), pp. 56-66. ( 33rd Italian Symposium on Advanced Database Systems, SEBD 2025 Ischia; Italy ).
Data Augmentation for Data-Centric AI Through the Lens of Semantic Technologies: A Position Paper
Gianluca Cima
;Valter Crescenzi;Marco Console;Roberto Maria Delfino;Stefano Iannucci;Domenico Lembo;Maurizio Lenzerini;Lorenzo Marconi;Laura Papi;Antonella Poggi;Federico Scafoglieri;
2025
Abstract
Data augmentation is a fundamental technique in machine learning to enhance model generalization by artificially expanding training datasets. However, conventional augmentation approaches often rely on heuristic transformations that may not fully capture domain-specific knowledge. This position paper advocates a data-centric AI perspective on data augmentation, emphasizing the integration of semantic technologies, particularly domain ontologies, to guide augmentation strategies. The use of techniques from Symbolic AI for data augmentation has been dealt with only in a few recent papers. Our goal is to explore further this idea, based on the consideration that an explicit representation of the domain may be helpful in two key tasks: optimizing the generation of new data, and validating the generated data, both fundamental steps for all data augmentation strategies. We aim at developing novel approaches that combine ontologies and data augmentation techniques to address these two tasks, in particular by relying on automated reasoning. We argue that leveraging knowledge representation and symbolic reasoning enables more principled and context-aware data augmentation, leading to improved model robustness and fairness.| File | Dimensione | Formato | |
|---|---|---|---|
|
Bertillo_Data-Augmentation_2025.pdf
accesso aperto
Note: https://ceur-ws.org/Vol-4182/paper46.pdf
Tipologia:
Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza:
Creative commons
Dimensione
933.85 kB
Formato
Adobe PDF
|
933.85 kB | Adobe PDF |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


