Twitter mining for fine-grained syndromic surveillance

Velardi, Paola; Stilo, Giovanni; Tozzi, Alberto E.; Gesualdo, Francesco

doi:10.1016/j.artmed.2014.01.002

Background: Digital traces left on the Internet by web users, if properly aggregated and analysed, can represent a huge information dataset able to inform syndromic surveillance systems in real time with data collected directly from individuals. Since people use everyday language rather than medical jargon (e.g. runny nose vs. respiratory distress), knowledge of patients’ terminology is essential for the mining of health related conversations on social networks. Objectives: In this paper we present a methodology for early detection and analysis of epidemics based on mining Twitter messages. In order to reliably trace messages of patients that actually complain of a disease, first, we learn a model of naïve medical language, second, we adopt a symptom-driven, rather than disease-driven, keyword analysis. This approach represents a major innovation compared to previous published work in the field. Method: We first developed an algorithm to automatically learn a variety of expressions

Background: Digital traces left on the Internet by web users, if properly aggregated and analyzed, can represent a huge information dataset able to inform syndromic surveillance systems in real time with data collected directly from individuals. Since people use everyday language rather than medical jargon (e.g. runny nose vs. respiratory distress), knowledge of patients' terminology is essential for the mining of health related conversations on social networks. Objectives: In this paper we present a methodology for early detection and analysis of epidemics based on mining Twitter messages. In order to reliably trace messages of patients that actually complain of a disease, first, we learn a model of naïve medical language, second, we adopt a symptom-driven, rather than disease-driven, keyword analysis. This approach represents a major innovation compared to previous published work in the field. Method: We first developed an algorithm to automatically learn a variety of expressions that people use to describe their health conditions, thus improving our ability to detect health-related "concepts" expressed in non-medical terms and, in the end, producing a larger body of evidence. We then implemented a Twitter monitoring instrument to finely analyze the presence and combinations of symptoms in tweets. Results: We first evaluate the algorithm's performance on an available dataset of diverse medical condition synonyms, then, we assess its utility in a case study of five common syndromes for surveillance purposes. We show that, by exploiting physicians' knowledge on symptoms positively or negatively related to a given disease, as well as the correspondence between patients' "naïve" terminology and medical jargon, not only can we analyze large volumes of Twitter messages related to that disease, but we can also mine micro-blogs with complex queries, performing fine-grained tweets classification (e.g. those reporting influenza-like illness (ILI) symptoms vs. common cold or allergy). Conclusions: Our approach yields a very high level of correlation with flu trends derived from traditional surveillance systems. Compared with Google Flu, another popular tool based on query search volumes, our method is more flexible and less sensitive to changes in web search behaviors. © 2014 Elsevier B.V. All rights reserved.

Twitter mining for fine-grained syndromic surveillance / Velardi, P., Stilo, G., Alberto E., T., Francesco, G.. - In: ARTIFICIAL INTELLIGENCE IN MEDICINE. - ISSN 0933-3657. - STAMPA. - (2014). [10.1016/j.artmed.2014.01.002]

Twitter mining for fine-grained syndromic surveillance

VELARDI, Paola;STILO, GIOVANNI;Alberto E. Tozzi;Francesco Gesualdo

2014

Abstract

Background: Digital traces left on the Internet by web users, if properly aggregated and analysed, can represent a huge information dataset able to inform syndromic surveillance systems in real time with data collected directly from individuals. Since people use everyday language rather than medical jargon (e.g. runny nose vs. respiratory distress), knowledge of patients’ terminology is essential for the mining of health related conversations on social networks. Objectives: In this paper we present a methodology for early detection and analysis of epidemics based on mining Twitter messages. In order to reliably trace messages of patients that actually complain of a disease, first, we learn a model of naïve medical language, second, we adopt a symptom-driven, rather than disease-driven, keyword analysis. This approach represents a major innovation compared to previous published work in the field. Method: We first developed an algorithm to automatically learn a variety of expressions

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2014
			
	Abstract
	
				Background: Digital traces left on the Internet by web users, if properly aggregated and analyzed, can represent a huge information dataset able to inform syndromic surveillance systems in real time with data collected directly from individuals. Since people use everyday language rather than medical jargon (e.g. runny nose vs. respiratory distress), knowledge of patients' terminology is essential for the mining of health related conversations on social networks. Objectives: In this paper we present a methodology for early detection and analysis of epidemics based on mining Twitter messages. In order to reliably trace messages of patients that actually complain of a disease, first, we learn a model of naïve medical language, second, we adopt a symptom-driven, rather than disease-driven, keyword analysis. This approach represents a major innovation compared to previous published work in the field. Method: We first developed an algorithm to automatically learn a variety of expressions that people use to describe their health conditions, thus improving our ability to detect health-related "concepts" expressed in non-medical terms and, in the end, producing a larger body of evidence. We then implemented a Twitter monitoring instrument to finely analyze the presence and combinations of symptoms in tweets. Results: We first evaluate the algorithm's performance on an available dataset of diverse medical condition synonyms, then, we assess its utility in a case study of five common syndromes for surveillance purposes. We show that, by exploiting physicians' knowledge on symptoms positively or negatively related to a given disease, as well as the correspondence between patients' "naïve" terminology and medical jargon, not only can we analyze large volumes of Twitter messages related to that disease, but we can also mine micro-blogs with complex queries, performing fine-grained tweets classification (e.g. those reporting influenza-like illness (ILI) symptoms vs. common cold or allergy). Conclusions: Our approach yields a very high level of correlation with flu trends derived from traditional surveillance systems. Compared with Google Flu, another popular tool based on query search volumes, our method is more flexible and less sensitive to changes in web search behaviors. © 2014 Elsevier B.V. All rights reserved.
			
	Parole chiave
	
				micro-blog mining; patient's language learning; syndromic surveillance; terminology clustering; twitter mining
			
	Tipologia
	
				01 Pubblicazione su rivista::01a Articolo in rivista
			
	Citazione
	
				Twitter mining for fine-grained syndromic surveillance / Velardi, P., Stilo, G., Alberto E., T., Francesco, G.. - In: ARTIFICIAL INTELLIGENCE IN MEDICINE. - ISSN 0933-3657. - STAMPA. - (2014). [10.1016/j.artmed.2014.01.002]
			
	Appartiene alla tipologia:
	
				01a Articolo in rivista

File allegati a questo prodotto

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/543272

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

27

76

57

Catalogo dei prodotti della ricerca