One of the main needs to face in front of a huge amount of contents is to classify them in themes. The present study compares a manual tagging with an automatic procedure implemented in the context of Machine Learning applied to food risk issues. For a year, web sources have been monitored through the web monitoring application Web-Live®, developed by the company Extreme s.r.l. (http://www.web-live.it) and 12,163 contents were collected. Subsequently, the items were in parallel labelled according to two procedures: a manual (Elo & Kyngäs, 2008) and an automatic one (cf. Tuzzi, 2003), that is the Latent Dirichlet Allocation (LDA) (Blei, Ng, & Jordan, 2003) implemented in the “topicmodels” package (Grün & Hornik, 2011) available in R. Discrepancies and overlapping of the labelling and the classification have been observed using the data visualisation software Qlik Sense®. Both procedures highlighted mostly the same contents as regards the labelling goal, and return a similar classification regarding the overlapping topics. The analysis of both outputs showed that the automatic procedure preferably returned precise and detailed topics, whereas the manual procedure enabled more levels of tagging. Results have been further discussed highlighting the criticality and potential of the approaches addressed, to inform any additional application

Thematising online food risks: Comparison of a manual tagging procedure and topic modelling / Rizzoli, Valentina; Ruzza, Mirko; Lunardi, Luca; Tiozzo, Barbara; Ravarotto, Licia. - (2020). (Intervento presentato al convegno JADT 2020, 15th International Conference on the Statistical Analysis of Textual Data tenutosi a TOULOUSE (France)).

Thematising online food risks: Comparison of a manual tagging procedure and topic modelling

Valentina Rizzoli;
2020

Abstract

One of the main needs to face in front of a huge amount of contents is to classify them in themes. The present study compares a manual tagging with an automatic procedure implemented in the context of Machine Learning applied to food risk issues. For a year, web sources have been monitored through the web monitoring application Web-Live®, developed by the company Extreme s.r.l. (http://www.web-live.it) and 12,163 contents were collected. Subsequently, the items were in parallel labelled according to two procedures: a manual (Elo & Kyngäs, 2008) and an automatic one (cf. Tuzzi, 2003), that is the Latent Dirichlet Allocation (LDA) (Blei, Ng, & Jordan, 2003) implemented in the “topicmodels” package (Grün & Hornik, 2011) available in R. Discrepancies and overlapping of the labelling and the classification have been observed using the data visualisation software Qlik Sense®. Both procedures highlighted mostly the same contents as regards the labelling goal, and return a similar classification regarding the overlapping topics. The analysis of both outputs showed that the automatic procedure preferably returned precise and detailed topics, whereas the manual procedure enabled more levels of tagging. Results have been further discussed highlighting the criticality and potential of the approaches addressed, to inform any additional application
2020
JADT 2020, 15th International Conference on the Statistical Analysis of Textual Data
Content analyses, manual tagging, latent Dirichlet allocation, food risk communication
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
Thematising online food risks: Comparison of a manual tagging procedure and topic modelling / Rizzoli, Valentina; Ruzza, Mirko; Lunardi, Luca; Tiozzo, Barbara; Ravarotto, Licia. - (2020). (Intervento presentato al convegno JADT 2020, 15th International Conference on the Statistical Analysis of Textual Data tenutosi a TOULOUSE (France)).
File allegati a questo prodotto
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1557060
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact