NLP models are used in a variety of critical social computing tasks, such as detecting sexist, racist, or otherwise hateful content. Therefore, it is imperative that these models are robust to spurious features. Past work has attempted to tackle such spurious features using training data augmentation, including Counterfactually Augmented Data (CADs). CADs introduce minimal changes to existing training data points and flip their labels; training on them may reduce model dependency on spurious features. However, manually generating CADs can be time-consuming and expensive. Hence in this work, we assess if this task can be automated using generative NLP models. We automatically generate CADs using Polyjuice, ChatGPT, and Flan-T5, and evaluate their usefulness in improving model robustness compared to manually-generated CADs. By testing both model performance on multiple out-of-domain test sets and individual data point efficacy, our results show that while manual CADs are still the most effective, CADs generated by ChatGPT come a close second. One key reason for the lower performance of automated methods is that the changes they introduce are often insufficient to flip the original label.

People Make Better Edits: Measuring the Efficacy of {LLM}-Generated Counterfactually Augmented Data for Harmful Language Detection / Sen, Indira; Assenmacher, Dennis; Samory, Mattia; Augenstein, Isabelle; van der Aalst, Wil; Wagner, Claudia. - (2023), pp. 10480-10504. (Intervento presentato al convegno Conference on Empirical Methods in Natural Language Processing tenutosi a Singapore) [10.18653/v1/2023.emnlp-main.649].

People Make Better Edits: Measuring the Efficacy of {LLM}-Generated Counterfactually Augmented Data for Harmful Language Detection

Mattia Samory;
2023

Abstract

NLP models are used in a variety of critical social computing tasks, such as detecting sexist, racist, or otherwise hateful content. Therefore, it is imperative that these models are robust to spurious features. Past work has attempted to tackle such spurious features using training data augmentation, including Counterfactually Augmented Data (CADs). CADs introduce minimal changes to existing training data points and flip their labels; training on them may reduce model dependency on spurious features. However, manually generating CADs can be time-consuming and expensive. Hence in this work, we assess if this task can be automated using generative NLP models. We automatically generate CADs using Polyjuice, ChatGPT, and Flan-T5, and evaluate their usefulness in improving model robustness compared to manually-generated CADs. By testing both model performance on multiple out-of-domain test sets and individual data point efficacy, our results show that while manual CADs are still the most effective, CADs generated by ChatGPT come a close second. One key reason for the lower performance of automated methods is that the changes they introduce are often insufficient to flip the original label.
2023
Conference on Empirical Methods in Natural Language Processing
cad; counterfactual; data augmentation; llm; large language models
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
People Make Better Edits: Measuring the Efficacy of {LLM}-Generated Counterfactually Augmented Data for Harmful Language Detection / Sen, Indira; Assenmacher, Dennis; Samory, Mattia; Augenstein, Isabelle; van der Aalst, Wil; Wagner, Claudia. - (2023), pp. 10480-10504. (Intervento presentato al convegno Conference on Empirical Methods in Natural Language Processing tenutosi a Singapore) [10.18653/v1/2023.emnlp-main.649].
File allegati a questo prodotto
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1696238
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact