Catalogo dei prodotti della ricerca

The concept of collaborative R&D has been increasing interest among scholars and policy-makers, making collaboration a pivotal determinant to innovate nowadays. The availability of reliable data is a necessary condition to obtain valuable results. Specifically, in a collaborative environment, we must avoid mistaken identities among organizations. In many datasets, indeed, the same organization can appear in a nonunivocal way. Thus its information is shared among multiple entities. In this work, we propose a novel methodology to disambiguate organization names. In particular, we combine supervised and unsupervised techniques to design a “hybrid” methodology that is neither fully automated nor completely manual, and easy to adapt to many different datasets. Thus, the flexibility and potential scalability of the methodology make this paper a worthwhile contribution to different research fields. We provide an empirical application of the methodology to the dataset of participants in projects funded by the first three European Framework Programmes. This choice is because we can test the quality of our procedure by comparing the refined dataset it returns to a well-recognized benchmark (i.e., the EUPRO database) in terms of the connection structure of the collaborative networks. Our results show the advantages of our approach based on the quality of the obtained dataset, and the efficiency of the designed methodology, leaving space for the integration of affiliation hierarchies in the future.

A novel methodology to disambiguate organization names: an application to EU Framework Programmes data / Ancona, A., Cerqueti, R., Vagnani, G.. - In: SCIENTOMETRICS. - ISSN 0138-9130. - 128:(2023), pp. 4447-4474. [10.1007/s11192-023-04746-x]

A novel methodology to disambiguate organization names: an application to EU Framework Programmes data

A. Ancona;R. Cerqueti;G. Vagnani

2023

Abstract

The concept of collaborative R&D has been increasing interest among scholars and policy-makers, making collaboration a pivotal determinant to innovate nowadays. The availability of reliable data is a necessary condition to obtain valuable results. Specifically, in a collaborative environment, we must avoid mistaken identities among organizations. In many datasets, indeed, the same organization can appear in a nonunivocal way. Thus its information is shared among multiple entities. In this work, we propose a novel methodology to disambiguate organization names. In particular, we combine supervised and unsupervised techniques to design a “hybrid” methodology that is neither fully automated nor completely manual, and easy to adapt to many different datasets. Thus, the flexibility and potential scalability of the methodology make this paper a worthwhile contribution to different research fields. We provide an empirical application of the methodology to the dataset of participants in projects funded by the first three European Framework Programmes. This choice is because we can test the quality of our procedure by comparing the refined dataset it returns to a well-recognized benchmark (i.e., the EUPRO database) in terms of the connection structure of the collaborative networks. Our results show the advantages of our approach based on the quality of the obtained dataset, and the efficiency of the designed methodology, leaving space for the integration of affiliation hierarchies in the future.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2023
			
	Parole chiave
	
				organization name disambiguation; hybrid methodology; institutions; labels; collaborative networks; EU Framework Programmes.
			
	Tipologia
	
				01 Pubblicazione su rivista::01a Articolo in rivista
			
	Citazione
	
				A novel methodology to disambiguate organization names: an application to EU Framework Programmes data / Ancona, A., Cerqueti, R., Vagnani, G.. - In: SCIENTOMETRICS. - ISSN 0138-9130. - 128:(2023), pp. 4447-4474. [10.1007/s11192-023-04746-x]
			
	Appartiene alla tipologia:
	
				01a Articolo in rivista

File allegati a questo prodotto

File	Dimensione	Formato
Scientometrics - Ancona Vagnani 2023.pdf solo gestori archivio Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 1.27 MB Formato Adobe PDF Contatta l'autore	1.27 MB	Adobe PDF	Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1679263

Citazioni

ND

6

4

social impact