This study explores the application of Topic Modeling (TM) to analyze oral history sources, focusing on the ethnic cleansing under Stalinism (1930s–1950s) in the USSR. Using over 500 interviews with survivors, their descendants, and witnesses, this research employs computational linguistics to uncover latent thematic patterns in collective memory narratives. The TM algorithm, rooted in Natural Language Processing and machine learning, enables the systematic categorization of personal accounts. Key findings reveal significant themes—such as violence, famine, and mortality—often refuted by official Soviet records. Generational differences are also identified: firsthand accounts emphasize camp life details, while subsequent generations highlight the broader impacts of repression. This approach not only addresses challenges of incomplete archival data but also bridges gaps between official historical narratives and personal memories. By leveraging big data techniques, the study advances historical analysis, offering a deeper understanding of ethnic cleansing and its enduring impact on collective memory.
Topic Modeling as a Way of Unlocking the Informational Potential of Oral History Sources in Studying the History of Ethnic Cleansing in the USSR: algorithm development and approbation of the program code / Iashchenko, Iuliia; Iashchenko, Anatolii. - (2024). [10.5281/zenodo.1376107]
Topic Modeling as a Way of Unlocking the Informational Potential of Oral History Sources in Studying the History of Ethnic Cleansing in the USSR: algorithm development and approbation of the program code.
Iuliia Iashchenko
;Anatolii Iashchenko
2024
Abstract
This study explores the application of Topic Modeling (TM) to analyze oral history sources, focusing on the ethnic cleansing under Stalinism (1930s–1950s) in the USSR. Using over 500 interviews with survivors, their descendants, and witnesses, this research employs computational linguistics to uncover latent thematic patterns in collective memory narratives. The TM algorithm, rooted in Natural Language Processing and machine learning, enables the systematic categorization of personal accounts. Key findings reveal significant themes—such as violence, famine, and mortality—often refuted by official Soviet records. Generational differences are also identified: firsthand accounts emphasize camp life details, while subsequent generations highlight the broader impacts of repression. This approach not only addresses challenges of incomplete archival data but also bridges gaps between official historical narratives and personal memories. By leveraging big data techniques, the study advances historical analysis, offering a deeper understanding of ethnic cleansing and its enduring impact on collective memory.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.