What is reinforcement fine-tuning?
In traditional supervised fine-tuning, you provide a dataset with labeled examples showing exactly what the model should output. In reinforcement fine-tuning, you instead provide:- A dataset: Prompts, with input examples for the model to respond to
- An evaluator: Code that scores the model’s outputs from 0.0 (bad) to 1.0 (good), also known as a reward function
- An agent: An LLM application, with access to tools, APIs, and data needed for your task
Use cases
Reinforcement fine-tuning helps you train models to excel at:- Code generation and analysis - Writing and debugging functions with verifiable execution results or test outcomes
- Structured output generation - JSON formatting, data extraction, classification, and schema compliance with programmatic validation
- Domain-specific reasoning - Legal analysis, financial modeling, or medical triage with verifiable criteria and compliance checks
- Tool-using agents - Multi-step workflows where agents call external APIs with measurable success criteria
How it works
Design your evaluator
Define how you’ll score model outputs from 0 to 1. For example, scoring outputs higher by checking if your agent called the right tools, or if your LLM-as-judge rates the output highly.
Prepare dataset
Create a JSONL file with prompts (system and user messages). These will be used to generate rollouts during training.
Connect your agent
Train locally, or connect your agent as a remote server to Fireworks with our /init and /status endpoints.
Launch training
Create an RFT job via the UI or CLI. Fireworks orchestrates rollouts, evaluates them, and trains the model to maximize reward.
RFT works best when:
- You can determine whether a model’s output is “good” or “bad,” even if only approximately
- You have prompts but lack perfect “golden” completions to learn from
- The task requires multi-step reasoning where evaluating intermediate steps is hard
- You want the model to explore creative solutions beyond your training examples
Next steps
Create an evaluator
Learn how to design effective reward functions
Kick off training
Learn how to launch and configure RFT jobs