NLP pipelines are composed of pre-processing steps, inference and post processing steps. This is why in HuggingFace we have the pipeline abstraction.

Preprocessing with a tokenizer

Like other neural networks, Transformer models can’t process raw text directly, so the first step of our pipeline is to convert the text inputs into numbers that the model can make sense of. To do this we use a tokenizer, which will be responsible for:

  • Splitting the input into words, subwords, or symbols (like punctuation) that are called tokens
  • Mapping each token to an integer
  • Adding additional inputs that may be useful to the model All this preprocessing needs to be done in exactly the same way as when the model was pretrained

Transformer head

The model heads take the high-dimensional vector of hidden states as input and project them onto a different dimension. The head can be customized to achieve different goals:

  • *Model (retrieve the hidden states)
  • *ForCausalLM
  • *ForMaskedLM
  • *ForMultipleChoice
  • *ForQuestionAnswering
  • *ForSequenceClassification
  • *ForTokenClassification

Models on Hugging Face

When you publish a model on Hugging Face, you typically include:

  1. The Pretrained Backbone: This is the core Transformer model (e.g., BERT, GPT, etc.), which contains the self-attention mechanism, feedforward layers, etc.
  2. Task-Specific Heads (if applicable): You can optionally include one or more pretrained heads for specific tasks, like:
    • Token Classification: For Named Entity Recognition (NER) or part-of-speech tagging.
    • Multiple Choice: For question-answering tasks with multiple options.
    • Sequence Classification: For sentiment analysis, spam detection, etc.
    • Question Answering: For span-based extraction tasks.

These heads are task-specific layers that sit on top of the backbone and are trained for their respective purposes. If you publish the model with these heads, users can load them directly (e.g., using AutoModelForSequenceClassification, AutoModelForQuestionAnswering, etc.). Users can also create their own headers by loading the backbone like so:

from transformers import BertModel
from torch import nn
 
backbone = BertModel.from_pretrained("path-to-backbone")
class MyCustomHead(nn.Module):
    def __init__(self, backbone):
        super().__init__()
        self.backbone = backbone
        self.classifier = nn.Linear(backbone.config.hidden_size, num_labels)
 
    def forward(self, input_ids, attention_mask=None):
        outputs = self.backbone(input_ids, attention_mask=attention_mask)
        cls_output = outputs.last_hidden_state[:, 0]  # [CLS] token
        return self.classifier(cls_output)

Tip

Pre-trained heads are in general more performant