Preserving Privacy in Large Language Models: A Survey on Current Threats
  and Solutions

Miranda, Michele; Elena Sofia Ruzzetti,; Santilli, Andrea; Fabio Massimo Zanzotto,; Bratières, Sébastien; Rodolà, Emanuele

Large Language Models (LLMs) represent a significant advancement in artificial intelligence, finding applications across various domains. However, their reliance on massive internet-sourced datasets for training brings notable privacy issues, which are exacerbated in critical domains (e.g., healthcare). Moreover, certain application-specific scenarios may require fine-tuning these models on private data. This survey critically examines the privacy threats associated with LLMs, emphasizing the potential for these models to memorize and inadvertently reveal sensitive information. We explore current threats by reviewing privacy attacks on LLMs and propose comprehensive solutions for integrating privacy mechanisms throughout the entire learning pipeline. These solutions range from anonymizing training datasets to implementing differential privacy during training or inference and machine unlearning after training. Our comprehensive review of existing literature highlights ongoing challenges, available tools, and future directions for preserving privacy in LLMs. This work aims to guide the development of more secure and trustworthy AI systems by providing a thorough understanding of privacy preservation methods and their effectiveness in mitigating risks.

Preserving Privacy in Large Language Models: A Survey on Current Threats and Solutions / Miranda, Michele; Sofia Ruzzetti, Elena; Santilli, Andrea; Massimo Zanzotto, Fabio; Bratières, Sébastien; Rodolà, Emanuele. - In: TRANSACTIONS ON MACHINE LEARNING RESEARCH. - ISSN 2835-8856. - (2024).

Preserving Privacy in Large Language Models: A Survey on Current Threats and Solutions

Michele Miranda;Elena Sofia Ruzzetti;Andrea Santilli;Fabio Massimo Zanzotto;Sébastien Bratières;Emanuele Rodolà

2024

Abstract

Large Language Models (LLMs) represent a significant advancement in artificial intelligence, finding applications across various domains. However, their reliance on massive internet-sourced datasets for training brings notable privacy issues, which are exacerbated in critical domains (e.g., healthcare). Moreover, certain application-specific scenarios may require fine-tuning these models on private data. This survey critically examines the privacy threats associated with LLMs, emphasizing the potential for these models to memorize and inadvertently reveal sensitive information. We explore current threats by reviewing privacy attacks on LLMs and propose comprehensive solutions for integrating privacy mechanisms throughout the entire learning pipeline. These solutions range from anonymizing training datasets to implementing differential privacy during training or inference and machine unlearning after training. Our comprehensive review of existing literature highlights ongoing challenges, available tools, and future directions for preserving privacy in LLMs. This work aims to guide the development of more secure and trustworthy AI systems by providing a thorough understanding of privacy preservation methods and their effectiveness in mitigating risks.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2024
			
	Parole chiave
	
				Computer Science - Cryptography and Security; Computer Science - Cryptography and Security; Computer Science - Artificial Intelligence; Computer Science - Computation and Language; Computer Science - Learning
			
	Tipologia
	
				01 Pubblicazione su rivista::01a Articolo in rivista
			
	Citazione
	
				Preserving Privacy in Large Language Models: A Survey on Current Threats
  and Solutions / Miranda, Michele; Sofia Ruzzetti, Elena; Santilli, Andrea; Massimo Zanzotto, Fabio; Bratières, Sébastien; Rodolà, Emanuele. - In: TRANSACTIONS ON MACHINE LEARNING RESEARCH. - ISSN 2835-8856. - (2024).

File allegati a questo prodotto

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1726980

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

ND

ND

Catalogo dei prodotti della ricerca

Preserving Privacy in Large Language Models: A Survey on Current Threats and Solutions

Michele Miranda;Elena Sofia Ruzzetti;Andrea Santilli;Fabio Massimo Zanzotto;Sébastien Bratières;Emanuele Rodolà

2024

Abstract

Scheda breve

Scheda completa

Attenzione

Citazioni

social impact

Catalogo dei prodotti della ricerca

Preserving Privacy in Large Language Models: A Survey on Current Threats and Solutions

Michele Miranda;Elena Sofia Ruzzetti;Andrea Santilli;Fabio Massimo Zanzotto;Sébastien Bratières;Emanuele Rodolà

2024

Abstract

Scheda breve Scheda completa

Informazioni

Attenzione

Citazioni

social impact

Conferma cancellazione

Scheda breve

Scheda completa