Catalogo dei prodotti della ricerca

Large language models have recently been shown to attain reasonable zero-shot generalization on a diverse set of tasks (Brown et al., 2020). It has been hypothesized that this is a consequence of implicit multitask learning in language models’ pretraining (Radford et al., 2019). Can zero-shot generalization instead be directly induced by explicit multitask learning? To test this question at scale, we develop a system for easily mapping any natural language tasks into a human-readable prompted form. We convert a large set of supervised datasets, each with multiple prompts with diverse wording. These prompted datasets allow for benchmarking the ability of a model to perform completely unseen tasks. We fine-tune a pre trained encoder-decoder model (Raffel et al., 2020; Lester et al., 2021) on this multitask mixture covering a wide variety of tasks. The model attains strong zero-shot performance on several standard datasets, often outperforming models up to 16×its size. Further, our approach attains strong performance on a subset of tasks from the BIG-bench benchmark, outperforming models up to 6× its size. All trained models are available at https://github.com/bigscience-workshop/t-zero, and all prompts are available at https://github.com/bigscience-workshop/promptsource.

Multitask prompted training enables zero-shot task generalization / Sanh, Victor; Webson, Albert; Raffel, Colin; Bach, Stephen; Sutawika, Lintang; Alyafeai, Zaid; Chaffin, Antoine; Stiegler, Arnaud; Raja, Arun; Dey, Manan; Saiful Bari, M; Xu, Canwen; Thakker, Urmish; Sharma Sharma, Shanya; Szczechla, Eliza; Kim, Taewoon; Chhablani, Gunjan; Nayak, Nihal; Datta, Debajyoti; Chang, Jonathan; Tian-Jian Jiang, Mike; Wang, Han; Manica, Matteo; Shen, Sheng; Xin Yong, Zheng; Pandey, Harshit; Bawden, Rachel; Wang, Thomas; Neeraj, Trishala; Rozen, Jos; Sharma, Abheesht; Santilli, Andrea; Fevry, Thibault; Alan Fries, Jason; Teehan, Ryan; Le Scao, Teven; Biderman, Stella; Gao, Leo; Wolf, Thomas; M Rush, Alexander. - (2022). (Intervento presentato al convegno The Tenth International Conference on Learning Representations tenutosi a Virtual Conference).

Multitask prompted training enables zero-shot task generalization

Victor Sanh;Albert Webson;Colin Raffel;Stephen Bach;Lintang Sutawika;Zaid Alyafeai;Antoine Chaffin;Arnaud Stiegler;Arun Raja;Manan Dey;M Saiful Bari;Canwen Xu;Urmish Thakker;Shanya Sharma Sharma;Eliza Szczechla;Taewoon Kim;Gunjan Chhablani;Nihal Nayak;Debajyoti Datta;Jonathan Chang;Mike Tian-Jian Jiang;Han Wang;Matteo Manica;Sheng Shen;Zheng Xin Yong;Harshit Pandey;Rachel Bawden;Thomas Wang;Trishala Neeraj;Jos Rozen;Abheesht Sharma;Andrea Santilli;Thibault Fevry;Jason Alan Fries;Ryan Teehan;Teven Le Scao;Stella Biderman;Leo Gao;Thomas Wolf;Alexander M Rush

2022

Abstract

Large language models have recently been shown to attain reasonable zero-shot generalization on a diverse set of tasks (Brown et al., 2020). It has been hypothesized that this is a consequence of implicit multitask learning in language models’ pretraining (Radford et al., 2019). Can zero-shot generalization instead be directly induced by explicit multitask learning? To test this question at scale, we develop a system for easily mapping any natural language tasks into a human-readable prompted form. We convert a large set of supervised datasets, each with multiple prompts with diverse wording. These prompted datasets allow for benchmarking the ability of a model to perform completely unseen tasks. We fine-tune a pre trained encoder-decoder model (Raffel et al., 2020; Lester et al., 2021) on this multitask mixture covering a wide variety of tasks. The model attains strong zero-shot performance on several standard datasets, often outperforming models up to 16×its size. Further, our approach attains strong performance on a subset of tasks from the BIG-bench benchmark, outperforming models up to 6× its size. All trained models are available at https://github.com/bigscience-workshop/t-zero, and all prompts are available at https://github.com/bigscience-workshop/promptsource.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2022
			
	Nome convegno
	
				The Tenth International Conference on Learning Representations
			
	Parole chiave
	
				zero-shot generalization;  explicit multitask learning
			
	Tipologia
	
				04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
			
	Citazione
	
				Multitask prompted training enables zero-shot task generalization / Sanh, Victor; Webson, Albert; Raffel, Colin; Bach, Stephen; Sutawika, Lintang; Alyafeai, Zaid; Chaffin, Antoine; Stiegler, Arnaud; Raja, Arun; Dey, Manan; Saiful Bari, M; Xu, Canwen; Thakker, Urmish; Sharma Sharma, Shanya; Szczechla, Eliza; Kim, Taewoon; Chhablani, Gunjan; Nayak, Nihal; Datta, Debajyoti; Chang, Jonathan; Tian-Jian Jiang, Mike; Wang, Han; Manica, Matteo; Shen, Sheng; Xin Yong, Zheng; Pandey, Harshit; Bawden, Rachel; Wang, Thomas; Neeraj, Trishala; Rozen, Jos; Sharma, Abheesht; Santilli, Andrea; Fevry, Thibault; Alan Fries, Jason; Teehan, Ryan; Le Scao, Teven; Biderman, Stella; Gao, Leo; Wolf, Thomas; M Rush, Alexander. - (2022). (Intervento presentato al  convegno The Tenth International Conference on Learning Representations tenutosi a Virtual Conference).

File allegati a questo prodotto

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1672126

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

ND

ND

social impact