Use this file to discover all available pages before exploring further.
Reinforcement Fine-Tuning (RFT) is free for models under 16B parameters. When creating an RFT job in the UI, filter for free tuning models in the model selection area on the fine-tuning creation page. If kicking off jobs from the terminal, you can find the model ID from the Model Library. Note: SFT and DPO jobs are billed per training token for all model sizes—see the pricing page for details.
The Eval Protocol CLI provides the fastest, most reproducible way to launch RFT jobs. This page covers everything you need to know about using eval-protocol create rft.
Upload your evaluator to Fireworks. If you don’t have one yet, see Concepts > Evaluators
Upload your dataset to Fireworks
Create and launch the RFT job
1
Install Eval Protocol CLI
pip install eval-protocol
Verify installation:
eval-protocol --version
2
Set up authentication
Configure your Fireworks API key:
export FIREWORKS_API_KEY="fw_your_api_key_here"
Or create a .env file:
FIREWORKS_API_KEY=fw_your_api_key_here
3
Test your evaluator locally
Before training, verify your evaluator works. This command discovers and runs your @evaluation_test with pytest. If a Dockerfile is present, it builds an image and runs the test in Docker; otherwise it runs on your host.
cd evaluator_directoryep local-test
If using a Dockerfile, it must use a Debian-based image (no Alpine or CentOS), be single-stage (no multi-stage builds), and only use supported instructions: FROM, RUN, COPY, ADD, WORKDIR, USER, ENV, CMD, ENTRYPOINT, ARG. Instructions like EXPOSE and VOLUME are ignored. See the RFT quickstart guide for details.
4
Create the RFT job
From the directory where your evaluator and dataset (dataset.jsonl) are located,
Customize your RFT job with these flags:Model and output:
--base-model accounts/fireworks/models/llama-v3p1-8b-instruct # Base model to fine-tune--output-model my-custom-name # Name for fine-tuned model
Training parameters:
--epochs 2 # Number of training epochs (default: 1)--learning-rate 5e-5 # Learning rate (default: 1e-4)--lora-rank 16 # LoRA rank (default: 8)--batch-size 65536 # Batch size in tokens (default: 32768)--chunk-size 200 # Prompts rolled out per GRPO training step (default: 200). -1 disables chunking.
Loss method:
--rl-loss-method dapo # RL loss method: grpo (default), dapo, gspo-token--rl-kl-beta 0.001 # KL beta override (only for grpo; rejected for dapo/gspo-token)
Rollout (sampling) parameters:
--temperature 0.8 # Sampling temperature (default: 0.7)--n 8 # Number of rollouts per prompt (default: 4)--response-candidates-count 8 # Alias for --n in firectl (default: 8, minimum: 2)--max-tokens 4096 # Max tokens per response (default: 32768)--top-p 0.95 # Top-p sampling (default: 1.0)--top-k 50 # Top-k sampling (default: 40)--max-concurrent-rollouts 64 # Max in-flight rollouts per job (default: 96, or the value set in @evaluation_test). Throughput only; no training effect.
Remote environments:
--remote-server-url https://your-evaluator.example.com # For remote rollout processing