Skip to main content

What this is

The cookbook’s training.recipes.distillation_loop trains one student from its own rollouts while frozen teacher deployments score those same responses. Use it when you want recipe-managed trainer provisioning, student sampling, teacher scoring, checkpointing, and cleanup for distillation experiments.

Modes

ModeUse whenTeacher signalTraining loss
sampled_reverse_klYou want OPD-style sampled-token distillationTeacher logprob on each sampled response tokenimportance_sampling
topk_forward_klYou want sparse SDFT soft labels from teacher top-K tokensTeacher top_logprobs=K per response positioncross_entropy with [N, K] targets
sampled_reverse_kl is the default. The student samples on policy, the teacher scores the sampled tokens, and the recipe trains on the dense per-token gap:
teacher_logprob - sampling_logprob
For topk_forward_kl, set distill_mode=DistillMode.TOPK_FORWARD_KL and sdft_top_k.

Current limits and logprobs

The distillation recipe depends on the public inference logprobs response:
Field or request optionMeaning
top_kRequest-side sampling filter. It limits which next-token logits remain eligible for sampling and redistributes probability mass over that set.
sampling_maskOptional request flag for generated tokens. It can return the count or token IDs still eligible after sampling filters such as top_p and top_k.
logprobModel logprob for the returned token before sampling-temperature and sampling-filter renormalization. In the legacy response, this is token_logprobs.
sampling_logprobGeneration-only logprob of the sampled token after temperature and sampling filters are applied. Use this when comparing against the distribution that actually sampled the token.
top_logprobsResponse option for returning likely alternatives at each position. The public inference API currently caps this at 5, so sdft_top_k must be at most 5.
top_k and top_logprobs are different knobs: top_k changes sampling; top_logprobs only controls how many alternatives are returned in the response.

Minimal example

from training.recipes.distillation_loop import Config, main
from training.utils import DeployConfig, TrainerConfig

cfg = Config(
    log_path="./distillation_logs",
    base_model="accounts/fireworks/models/qwen3-8b",
    teacher_model="accounts/fireworks/models/qwen3-32b",
    dataset="/path/to/prompts.jsonl",
    trainer=TrainerConfig(
        training_shape_id="accounts/fireworks/trainingShapes/qwen3-8b-128k-h200",
    ),
    deployment=DeployConfig(tokenizer_model="Qwen/Qwen3-8B"),
    max_rows=100,
    epochs=1,
)

main(cfg)
If teacher_model is a base model resource, the recipe creates a frozen teacher deployment for scoring. If it is already an inference model or deployment resource, the recipe uses it directly.

Multi-teacher runs

Set multi_teacher=MultiTeacherConfig(...) when you have more than one teacher. With sampled_reverse_kl, multi-teacher OPD is routed: each dataset row is scored by exactly one teacher selected by the configured route key, defaulting to teacher. With topk_forward_kl, every configured teacher can score the sampled response and the recipe blends sparse top-K probability mass using TeacherConfig.blend_weight.

Dataset contract

Rows are JSONL objects. The only required field is messages, the student-visible OpenAI-style chat prompt. Optional fields:
FieldUse
teacherDefault route key for routed sampled reverse-KL MOPD. The value must match a configured TeacherConfig.route_value, or the teacher model when route_value is unset.
teacher_messagesTeacher-side prompt for privileged-context scoring. If omitted, the teacher scores under messages.
expected_answerOptional metadata for eval callbacks and smoke checks.
Student and teacher token IDs must use a compatible tokenizer and vocabulary. Prefer teachers from the same model family, and set TeacherConfig.tokenizer_model when you want the recipe to validate teacher tokenizers against DeployConfig.tokenizer_model.

Examples

The cookbook includes distillation examples under training/examples/distillation:
ExamplePathUse
Privileged-context OPD/SDFTgsm8k_privilegedStudent sees the problem; teacher can see privileged solution context.
Routed MOPD smokerouted_mopd/train_two_teacher_lora.pyTiny generated dataset with two route labels and a LoRA student.
Run from the cookbook repository:
cd training
FIREWORKS_API_KEY=... \
python examples/distillation/routed_mopd/train_two_teacher_lora.py

Next steps