Catalogo dei prodotti della ricerca

Conferences play a major role in some disciplines such as computer science and are often used in research quality evaluation exercises. Differently from journals and books, for which ISSN and ISBN codes provide unambiguous keys, recognizing the conference series in which a paper was published is a rather complex endeavor: There is no unique code assigned to conferences, and the way their names are written may greatly vary across years and catalogs. In this article, we propose a technique for the entity resolution of conferences based on the analysis of different semantic parts of their names. We present the results of an investigation of our technique on a dataset of 42,395 distinct computer science conference names excerpted from the DBLP computer science repository,1 which we automatically link to different authority files. With suitable data cleaning, the precision of our record linkage algorithm can be as high as 94%. A comparison with results obtainable using state-of-the-art general-purpose record linkage algorithms rounds off the article, showing that our ad hoc solution largely outperforms them in terms of the quality of the results.

Which Conference Is That? A Case Study in Computer Science / Demetrescu, C., Finocchi, I., Ribichini, A., Schaerf, M.. - In: ACM JOURNAL OF DATA AND INFORMATION QUALITY. - ISSN 1936-1955. - 14:3(2022), pp. 1-13. [10.1145/3519031]

Which Conference Is That? A Case Study in Computer Science

Demetrescu C.;Finocchi I.;Ribichini A.;Schaerf M.

2022

Abstract

Conferences play a major role in some disciplines such as computer science and are often used in research quality evaluation exercises. Differently from journals and books, for which ISSN and ISBN codes provide unambiguous keys, recognizing the conference series in which a paper was published is a rather complex endeavor: There is no unique code assigned to conferences, and the way their names are written may greatly vary across years and catalogs. In this article, we propose a technique for the entity resolution of conferences based on the analysis of different semantic parts of their names. We present the results of an investigation of our technique on a dataset of 42,395 distinct computer science conference names excerpted from the DBLP computer science repository,1 which we automatically link to different authority files. With suitable data cleaning, the precision of our record linkage algorithm can be as high as 94%. A comparison with results obtainable using state-of-the-art general-purpose record linkage algorithms rounds off the article, showing that our ad hoc solution largely outperforms them in terms of the quality of the results.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2022
			
	Parole chiave
	
				Bibliometrics; conference names; entity resolution; record linkage;
			
	Tipologia
	
				01 Pubblicazione su rivista::01a Articolo in rivista
			
	Citazione
	
				Which Conference Is That? A Case Study in Computer Science / Demetrescu, C., Finocchi, I., Ribichini, A., Schaerf, M.. - In: ACM JOURNAL OF DATA AND INFORMATION QUALITY. - ISSN 1936-1955. - 14:3(2022), pp. 1-13. [10.1145/3519031]
			
	Appartiene alla tipologia:
	
				01a Articolo in rivista

File allegati a questo prodotto

File	Dimensione	Formato
Demetrescu_Which-Conference_2022.pdf solo gestori archivio Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 203.61 kB Formato Adobe PDF Contatta l'autore	203.61 kB	Adobe PDF	Contatta l'autore
Demetrescu_postprint_Which-conference_2022.pdf accesso aperto Tipologia: Documento in Post-print (versione successiva alla peer review e accettata per la pubblicazione) Licenza: Creative commons Dimensione 192.96 kB Formato Adobe PDF	192.96 kB	Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1661035

Citazioni

ND

1

1

social impact