The current worldwide use of generative models in the field of Artificial Intelligence creates new possibilities but also new challenges and problems. In such an interconnected world, data produced by machines can become part of the training phases of new Language Models (LMs), and this can result in a problem that some scholars are starting to call “model collapse”. In this paper, we discuss the problem from three points of view. Firstly, we show how the accumulation of data can cause a given probability distribution obtained through a sampling procedure to degenerate towards a single-state-like equilibrium as the iterations of accumulations go by. The second perspective concerns the analysis through semantic, syntactic and complexity theory-related measures of texts produced through a recursive procedure involving an LSTM that is driven by its own generated data. The aim is to measure the quality of the text and the possible level of degradation. The third point concerns the synthesis of a simplified differential model that frames this scenario in which the models feed themselves with their data and which allows do estimates, albeit rough, about the future. Computational results show a model drift and degradation of the text structure that evolves towards simplified forms from the lexical point of view and degraded in terms of destruction of long-term correlations (one of the main ways to generate meaning in a sentence). Ultimately, the study demonstrates that the problem exists and, although far in the future thanks to the power of the modern architectures underlying LLMs, should not be underestimated.
LSTM in recursive feedback loops. A study on textual evolution and complexity / De Santis, Enrico; Martino, Alessio; Ronci, Francesca; Rizzi, Antonello. - (2025), pp. 1-10. ( 2025 International Joint Conference on Neural Networks (IJCNN) Rome, Italy ) [10.1109/IJCNN64981.2025.11227357].
LSTM in recursive feedback loops. A study on textual evolution and complexity
Enrico De Santis
;Antonello Rizzi
2025
Abstract
The current worldwide use of generative models in the field of Artificial Intelligence creates new possibilities but also new challenges and problems. In such an interconnected world, data produced by machines can become part of the training phases of new Language Models (LMs), and this can result in a problem that some scholars are starting to call “model collapse”. In this paper, we discuss the problem from three points of view. Firstly, we show how the accumulation of data can cause a given probability distribution obtained through a sampling procedure to degenerate towards a single-state-like equilibrium as the iterations of accumulations go by. The second perspective concerns the analysis through semantic, syntactic and complexity theory-related measures of texts produced through a recursive procedure involving an LSTM that is driven by its own generated data. The aim is to measure the quality of the text and the possible level of degradation. The third point concerns the synthesis of a simplified differential model that frames this scenario in which the models feed themselves with their data and which allows do estimates, albeit rough, about the future. Computational results show a model drift and degradation of the text structure that evolves towards simplified forms from the lexical point of view and degraded in terms of destruction of long-term correlations (one of the main ways to generate meaning in a sentence). Ultimately, the study demonstrates that the problem exists and, although far in the future thanks to the power of the modern architectures underlying LLMs, should not be underestimated.| File | Dimensione | Formato | |
|---|---|---|---|
|
De Santis_LSTM-in-Recursive-Feedback_2025.pdf
solo gestori archivio
Tipologia:
Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
1.55 MB
Formato
Adobe PDF
|
1.55 MB | Adobe PDF | Contatta l'autore |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


