Fine-tuning adapts general-purpose models to domain-specific tasks, significantly improving performance in real-world applications. In particular, fine-tuning can offer you:
Increased accuracy on specific tasks or reasoning in a specific domain.
Better performance and lower costs from using a smaller model.
For example, we have seen fine-tuning be especially helpful in these tasks:
Low-latency query understanding, summarization, and classification
Fireworks supports both Supervised Fine-Tuning (SFT) and Reinforcement Fine Tuning (RFT). In supervised fine-tuning, you provide a dataset with labeled examples of “good” outputs. In reinforcement fine-tuning, you provide a grader function that can be used to score the model’s outputs. The model is iteratively trained to produce outputs that maximize this score. To learn more about the differences between SFT and RFT, see when to use Supervised Fine-Tuning (SFT) vs. Reinforcement Fine-Tuning (RFT).To fine-tune a model efficiently, Fireworks uses a technique called Low-Rank Adaptation (LoRA). The fine-tuning process generates a LoRA addon that gets deployed onto a base model at inference time. The advantages of using LoRA are:
Models are faster and cheaper to train
Models are seamless to deploy on Fireworks
We can configure deployments to allow multiple LoRAs to be deployed onto them