Multi-Task Learning
Multi-task learning (MTL) trains a single model on multiple related tasks simultaneously, sharing representations across tasks. In local AI, MTL appears in models like BART or T5 that handle summarization, translation, and QA with one set of weights. For operators, MTL means a single downloaded model can serve multiple use cases without separate fine-tuned copies, saving VRAM and disk space. The trade-off: task-specific performance may be slightly lower than a dedicated model, but the convenience and resource efficiency often outweigh that cost.
Deeper dive
MTL works by having a shared backbone (e.g., transformer layers) and task-specific heads (e.g., classification head, generation head). During training, gradients from all tasks update the shared weights, forcing the model to learn general features useful across tasks. Common architectures include hard parameter sharing (shared layers, separate heads) and soft parameter sharing (each task has its own parameters but with regularization to encourage similarity). In practice, operators encounter MTL in models like FLAN-T5, which is fine-tuned on hundreds of tasks. When you run such a model in vLLM or llama.cpp, you can prompt it for different tasks without reloading – the same model handles translation, summarization, and reasoning. The main operator consideration is that MTL models often have larger tokenizer vocabularies or special prompt formats (e.g., "Translate English to French: ...") that must be followed correctly to get good results.
Practical example
FLAN-T5 XL (3B parameters) is an MTL model that can translate, summarize, and answer questions. On an RTX 3090 (24 GB VRAM), loading the full FP16 model (6 GB) leaves room for a 4K context. You can prompt it with "Translate to German: Hello world" then immediately with "Summarize: ..." without switching models. A single 3B MTL model replaces three separate fine-tuned models, saving ~12 GB of VRAM and disk space.
Workflow example
In Hugging Face Transformers, you load an MTL model like model = AutoModelForSeq2SeqLM.from_pretrained("google/flan-t5-xl"). Then you craft prompts with task prefixes: input_text = "Translate English to French: That is good.". The same model instance handles different tasks. In vLLM, you can serve the model and send requests with varying prompts – the runtime doesn't need to reload weights between tasks. This is useful for chatbots that need both Q&A and summarization in one session.
Reviewed by Fredoline Eruo. See our editorial policy.