Mamma Mia! Where’s My Name? De-Identifying Italian
Clinical Notes with Large Language Models

Miranda, Michele; Bratieres, Sebastien; Patarnello, Stefano; Lilli, Livia

The reuse of clinical free-text data plays a pivotal role in enabling advancements in medical research, healthcare analytics, and decision support systems. However, strict regulatory frameworks such as the General Data Protection Regulation (GDPR) and the Health Insurance Portability and Accountability Act (HIPAA) impose rigorous privacy requirements, particularly concerning the removal of Protected Health Information (PII). As a result, robust de-identification systems are essential to safeguard patient confidentiality while ensuring data usability. In this work, we present an adaptation of a prompt-based de-identification pipeline, originally developed for English-language clinical texts, to the Italian medical domain. Our approach prioritizes deployability in a real-world scenario, by relying exclusively on open-source large language models (LLMs), to ensure compliance with privacy constraints. Specifically, we experimented with different versions of Gemma, LLaMA, Mistral, and Phi to identify and redact sensitive entities, focusing on name, age, location, and date. Our evaluation, conducted on an open-source Italian clinical dataset, employs both a classical deterministic approach and a more modern LLM-as-a-judge framework with a voting-based aggregation mechanism, both based on the comparison to a gold standard manually annotated. In the deterministic setting, the pipeline achieved promising F1 scores between 0.65 and 0.81 across entity types. These results demonstrate the potential of using open-source LLMs for clinical de-identification in low-resource language settings, offering a privacy-compliant solution for real-world hospital deployments.

Mamma Mia! Where’s My Name? De-Identifying Italian Clinical Notes with Large Language Models / Miranda, Michele; Bratieres, Sebastien; Patarnello, Stefano; Lilli, Livia. - (2025). ( CliC-it 2025 Cagliari ).

Mamma Mia! Where’s My Name? De-Identifying Italian Clinical Notes with Large Language Models

Michele Miranda;Sebastien Bratieres;Stefano Patarnello;Livia Lilli

2025

Abstract

The reuse of clinical free-text data plays a pivotal role in enabling advancements in medical research, healthcare analytics, and decision support systems. However, strict regulatory frameworks such as the General Data Protection Regulation (GDPR) and the Health Insurance Portability and Accountability Act (HIPAA) impose rigorous privacy requirements, particularly concerning the removal of Protected Health Information (PII). As a result, robust de-identification systems are essential to safeguard patient confidentiality while ensuring data usability. In this work, we present an adaptation of a prompt-based de-identification pipeline, originally developed for English-language clinical texts, to the Italian medical domain. Our approach prioritizes deployability in a real-world scenario, by relying exclusively on open-source large language models (LLMs), to ensure compliance with privacy constraints. Specifically, we experimented with different versions of Gemma, LLaMA, Mistral, and Phi to identify and redact sensitive entities, focusing on name, age, location, and date. Our evaluation, conducted on an open-source Italian clinical dataset, employs both a classical deterministic approach and a more modern LLM-as-a-judge framework with a voting-based aggregation mechanism, both based on the comparison to a gold standard manually annotated. In the deterministic setting, the pipeline achieved promising F1 scores between 0.65 and 0.81 across entity types. These results demonstrate the potential of using open-source LLMs for clinical de-identification in low-resource language settings, offering a privacy-compliant solution for real-world hospital deployments.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2025
			
	Nome convegno
	
				CliC-it 2025
			
	Parole chiave
	
				De-Identification, Clinical notes, LLMs
			
	Tipologia
	
				04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
			
	Citazione
	
				Mamma Mia! Where’s My Name? De-Identifying Italian
Clinical Notes with Large Language Models / Miranda, Michele; Bratieres, Sebastien; Patarnello, Stefano; Lilli, Livia. - (2025). ( CliC-it 2025 Cagliari ).

File allegati a questo prodotto

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1755354

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

ND

ND

Catalogo dei prodotti della ricerca

Mamma Mia! Where’s My Name? De-Identifying Italian Clinical Notes with Large Language Models

Michele Miranda;Sebastien Bratieres;Stefano Patarnello;Livia Lilli

2025

Abstract

Scheda breve

Scheda completa

Attenzione

Citazioni

social impact

Catalogo dei prodotti della ricerca

Mamma Mia! Where’s My Name? De-Identifying Italian Clinical Notes with Large Language Models

Michele Miranda;Sebastien Bratieres;Stefano Patarnello;Livia Lilli

2025

Abstract

Scheda breve Scheda completa

Informazioni

Attenzione

Citazioni

social impact

Conferma cancellazione

Scheda breve

Scheda completa