One of the main needs to face in front of a huge amount of contents is to classify them in themes. The present study compares a manual tagging with an automatic procedure implemented in the context of Machine Learning applied to food risk issues. For a year, web sources have been monitored through the web monitoring application Web-Live®, developed by the company Extreme s.r.l. (http://www.web-live.it) and 12,163 contents were collected. Subsequently, the items were in parallel labelled according to two procedures: a manual (Elo & Kyngäs, 2008) and an automatic one (cf. Tuzzi, 2003), that is the Latent Dirichlet Allocation (LDA) (Blei, Ng, & Jordan, 2003) implemented in the “topicmodels” package (Grün & Hornik, 2011) available in R. Discrepancies and overlapping of the labelling and the classification have been observed using the data visualisation software Qlik Sense®. Both procedures highlighted mostly the same contents as regards the labelling goal, and return a similar classification regarding the overlapping topics. The analysis of both outputs showed that the automatic procedure preferably returned precise and detailed topics, whereas the manual procedure enabled more levels of tagging. Results have been further discussed highlighting the criticality and potential of the approaches addressed, to inform any additional application
Thematising online food risks: Comparison of a manual tagging procedure and topic modelling / Rizzoli, Valentina; Ruzza, Mirko; Lunardi, Luca; Tiozzo, Barbara; Ravarotto, Licia. - (2020). (Intervento presentato al convegno JADT 2020, 15th International Conference on the Statistical Analysis of Textual Data tenutosi a TOULOUSE (France)).
Thematising online food risks: Comparison of a manual tagging procedure and topic modelling
Valentina Rizzoli;
2020
Abstract
One of the main needs to face in front of a huge amount of contents is to classify them in themes. The present study compares a manual tagging with an automatic procedure implemented in the context of Machine Learning applied to food risk issues. For a year, web sources have been monitored through the web monitoring application Web-Live®, developed by the company Extreme s.r.l. (http://www.web-live.it) and 12,163 contents were collected. Subsequently, the items were in parallel labelled according to two procedures: a manual (Elo & Kyngäs, 2008) and an automatic one (cf. Tuzzi, 2003), that is the Latent Dirichlet Allocation (LDA) (Blei, Ng, & Jordan, 2003) implemented in the “topicmodels” package (Grün & Hornik, 2011) available in R. Discrepancies and overlapping of the labelling and the classification have been observed using the data visualisation software Qlik Sense®. Both procedures highlighted mostly the same contents as regards the labelling goal, and return a similar classification regarding the overlapping topics. The analysis of both outputs showed that the automatic procedure preferably returned precise and detailed topics, whereas the manual procedure enabled more levels of tagging. Results have been further discussed highlighting the criticality and potential of the approaches addressed, to inform any additional applicationI documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.