T5 Text-To-Text Transformer is a model designed to unify various NLP tasks into a text-to-text format.

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu, in the paper, “Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer“.

Submitted Version
October 23, 2019

Latest Version
September 19, 2023


Text to Text


Model Highlights

  • Task: Natural Language Processing (NLP) tasks (e.g., translation, summarization, question answering, etc.)
  • Model type: Text-to-Text Transformer
  • Framework: TensorFlow and Pytorch
  • Dataset: Multiple (e.g., GLUE, SuperGLUE, SQuAD, etc.)

Model Size and Parameters

The T5 model is built on the Transformer architecture. Its primary component is the self-attention mechanism,  which calculates the weights based on the pairwise affinity between the query of each element and the keys of all other elements in the sequence. This allows the model to focus differently on various parts of the input when processing each element. Refer this article to get more details on the math and more indepth intuition behind the attention mechanism.

The original Transformer model was designed as an encoder-decoder architecture, primarily for sequence-to-sequence tasks. However, more recent versions of Transformer models have been simplified to include just a single stack of Transformer layers. These streamlined models are specifically optimized for tasks like language modeling and text classification.

The T5 closely follows the original Transformer’s encoder-decoder structure, with some modifications. It maps an input sequence of tokens to a sequence of embeddings, which is then passed to the encoder. This encoder is made up of blocks, each containing a self-attention layer followed by a shallow feed-forward network. 

Layer normalization is applied to each subcomponent’s input, and a residual skip connection adds each subcomponent’s input to its output. 

The decoder mirrors the encoder but includes a standard attention mechanism post each self-attention layer, focusing on the encoder’s output. 

The self-attention mechanism in the decoder uses autoregressive or causal self-attention, allowing the model to attend only to past outputs.

One of the T5’s unique features is its position embedding scheme. While the original Transformer used fixed or learned absolute position embeddings, the T5 employs relative position embeddings. These produce different learned embeddings based on the offset between the key and query in the self-attention mechanism. This model also uses a simplified form of position embeddings, where each embedding is a scalar added to the corresponding logit for computing attention weights.

Expected Input

The T5 model expects textual prompts that are tailored to specific NLP tasks. The format of the input is designed to guide the model in understanding the desired task:

  1. For translation tasks, the input is structured as “translate [source language] to [target language]: [sentence]”. An example would be “translate English to German: ‘That is good.'”.
  2. For summarization tasks, the input is given as “summarize: [text]”. For instance, “summarize: state authorities dispatched emergency crews Tuesday to survey the damage after an onslaught of severe weather in Mississippi…”.
  3. For sentiment analysis or classification tasks, the input might be given in the format “cola sentence: [sentence]”. An example is “cola sentence: The course is jumping well.”.
  4. For tasks like semantic textual similarity, the input could be “stsb sentence1: [sentence1] sentence2: [sentence2]”. An example is “stsb sentence1: The rhino grazed on the grass. sentence2: A rhino is grazing in a field.”.

The model’s flexibility allows it to handle a wide range of tasks, and the input format plays a crucial role in directing the model’s attention to the desired output format.

Expected Output

The T5 model produces textual outputs corresponding to the specified task:

  1. For translation tasks, the output is the translated text in the target language. Using the earlier example, the output would be the translated text in German, such as “Das ist gut.”.
  2. For summarization tasks, the output is a concise summary of the provided text. For the example mentioned, the output might be “six people hospitalized after a storm in Attala county.”.
  3. For sentiment analysis or classification tasks, the output is a classification label or sentiment score. For instance, the output for the “cola sentence” example could be “not acceptable”.
  4. For semantic textual similarity tasks, the output is a similarity score between the two provided sentences. Using the previous example, the output might be “3.8”, indicating the degree of similarity between the two sentences.

In essence, the T5 model’s output is a textual representation that aligns with the specified task, providing relevant information or results based on the input prompt.

History and Applications

  • Originated from Google Research Brain Team.
  • Used for a variety of NLP tasks including but not limited to translation, question answering, and summarization.
  • The model’s unified approach has made it one of the benchmarks in NLP.
Add Your Heading Text Here
					from transformers import AutoFeatureExtractor, AutoModelForImageClassification

extractor = AutoFeatureExtractor.from_pretrained("microsoft/resnet-50")

model = AutoModelForImageClassification.from_pretrained("microsoft/resnet-50")