Catalogo dei prodotti della ricerca

Keyword Extraction (KE) is essential in Natural Language Processing (NLP) for identifying key terms that represent the main themes of a text, and it is vital for applications such as information retrieval, text summarisation, and document classification. Despite the development of various KE methods—including statistical approaches and advanced deep learning models—evaluating their effectiveness remains challenging. Current evaluation metrics focus on keyword quality, balance, and overlap with annotations from authors and professional indexers, but neglect real-world information retrieval needs. This paper introduces a novel evaluation method designed to overcome this limitation by using real query data from Google Trends and can be used with both supervised and unsupervised KE approaches. We applied this method to three popular KE approaches (YAKE, RAKE and KeyBERT) and found that KeyBERT was the most effective in capturing users’ top queries, with RAKE also showing surprisingly good performance. The code is open-access and publicly available.

Are Your Keywords Like My Queries? A Corpus-Wide Evaluation of Keyword Extractors with Real Searches / Galletti, Martina; Prevedello, Giulio; Brugnoli, Emanuele; Ruggiero Lo Sardo, D.; Gravino, Pietro. - (2025), pp. 1943-1951. (Intervento presentato al convegno International Conference on Computational Linguistics tenutosi a Abu Dhabi; UAE).

Are Your Keywords Like My Queries? A Corpus-Wide Evaluation of Keyword Extractors with Real Searches

Martina Galletti;Giulio Prevedello;Emanuele Brugnoli;D. Ruggiero Lo Sardo;Pietro Gravino

2025

Abstract

Keyword Extraction (KE) is essential in Natural Language Processing (NLP) for identifying key terms that represent the main themes of a text, and it is vital for applications such as information retrieval, text summarisation, and document classification. Despite the development of various KE methods—including statistical approaches and advanced deep learning models—evaluating their effectiveness remains challenging. Current evaluation metrics focus on keyword quality, balance, and overlap with annotations from authors and professional indexers, but neglect real-world information retrieval needs. This paper introduces a novel evaluation method designed to overcome this limitation by using real query data from Google Trends and can be used with both supervised and unsupervised KE approaches. We applied this method to three popular KE approaches (YAKE, RAKE and KeyBERT) and found that KeyBERT was the most effective in capturing users’ top queries, with RAKE also showing surprisingly good performance. The code is open-access and publicly available.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2025
			
	Nome convegno
	
				International Conference on Computational Linguistics
			
	Parole chiave
	
				Keywords Extraction; Metrics; NLP
			
	Tipologia
	
				04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
			
	Citazione
	
				Are Your Keywords Like My Queries? A Corpus-Wide Evaluation of Keyword Extractors with Real Searches / Galletti, Martina; Prevedello, Giulio; Brugnoli, Emanuele; Ruggiero Lo Sardo, D.; Gravino, Pietro. - (2025), pp. 1943-1951. (Intervento presentato al  convegno International Conference on Computational Linguistics tenutosi a Abu Dhabi; UAE).
			
	Appartiene alla tipologia:
	
				04b Atto di convegno in volume

File allegati a questo prodotto

File	Dimensione	Formato
Galletti_Are-Your-Keywords_2025.pdf accesso aperto Note: https://aclanthology.org/2025.coling-main.133.pdf Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 386.08 kB Formato Adobe PDF	386.08 kB	Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1732791

Citazioni

ND

ND

ND

social impact