We investigate how Large Language Models (LLMs) behave when simulating political discourse on social media. Leveraging 21 million interactions on X during the 2024 U.S. presidential election, we construct LLM agents based on 1186 real users, prompting them to reply to politically salient tweets under controlled conditions. Agents are initialized either with minimal ideological cues (Zero Shot) or recent tweet history (Few Shot), allowing one-to-one comparisons with human replies. We evaluate three model families — Gemini, Mistral, and DeepSeek — across linguistic style, ideological consistency, and toxicity. We find that richer contextualization improves internal consistency but also amplifies polarization, stylized signals, and harmful language. We observe an emergent distortion that we call “generation exaggeration”: a systematic amplification of salient traits beyond empirical baselines. Our analysis shows that LLMs do not emulate users, they reconstruct them. Their outputs, indeed, reflect internal optimization dynamics more than observed behavior, introducing structural biases that compromise their reliability as social proxies. This challenges their use in content moderation, deliberative simulations, and policy modeling.
Generative exaggeration in LLM social agents: Consistency, bias, and toxicity / Nudo, Jacopo; Pandolfo, Mario Edoardo; Loru, Edoardo; Samory, Mattia; Cinelli, Matteo; Quattrociocchi, Walter. - In: ONLINE SOCIAL NETWORKS AND MEDIA. - ISSN 2468-6964. - 51:January(2026). [10.1016/j.osnem.2025.100344]
Generative exaggeration in LLM social agents: Consistency, bias, and toxicity
Jacopo Nudo
Primo
;Mario Edoardo Pandolfo;Edoardo Loru;Mattia Samory;Matteo Cinelli;Walter Quattrociocchi
Ultimo
2026
Abstract
We investigate how Large Language Models (LLMs) behave when simulating political discourse on social media. Leveraging 21 million interactions on X during the 2024 U.S. presidential election, we construct LLM agents based on 1186 real users, prompting them to reply to politically salient tweets under controlled conditions. Agents are initialized either with minimal ideological cues (Zero Shot) or recent tweet history (Few Shot), allowing one-to-one comparisons with human replies. We evaluate three model families — Gemini, Mistral, and DeepSeek — across linguistic style, ideological consistency, and toxicity. We find that richer contextualization improves internal consistency but also amplifies polarization, stylized signals, and harmful language. We observe an emergent distortion that we call “generation exaggeration”: a systematic amplification of salient traits beyond empirical baselines. Our analysis shows that LLMs do not emulate users, they reconstruct them. Their outputs, indeed, reflect internal optimization dynamics more than observed behavior, introducing structural biases that compromise their reliability as social proxies. This challenges their use in content moderation, deliberative simulations, and policy modeling.| File | Dimensione | Formato | |
|---|---|---|---|
|
Nudo_Generative-Exaggeration_2026.pdf
accesso aperto
Note: https://doi.org/10.1016/j.osnem.2025.100344
Tipologia:
Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza:
Creative commons
Dimensione
2.96 MB
Formato
Adobe PDF
|
2.96 MB | Adobe PDF |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


