Technique | Teacher Model | Student Model | Primary Goal |
---|---|---|---|
Supervised Fine-Tuning (SFT) | DeepSeek-V3 (685B) | Qwen2.5-7B | Format Learning & Structure |
Reinforcement Fine-Tuning (RFT) | N/A (Self-improvement) | Supervised Fine-Tuned Qwen2.5-7B | Accuracy Optimization |
#### 18
format provides the ground truth answer we need for automated evaluation. We’ll extract this pattern to check model correctness.
Process Dataset for Training and Evaluation
Qwen2.5 7B
[WORK]
and [RESULT]
format automatically (without being told), we can apply RFT to improve the accuracy of answers within that structure.
[RESULT]
tagskd-rft-evaluator
kd-rft-dataset
you uploaded earlierkd-rft-evaluator.py
job.output_model
from your SFT job to obtain SFT model name (e.g., accounts/your-account/models/kd-sft-model
)kd-rft-dataset
from the dropdownkd-rft-evaluator
(the one you just created)kd-rft-model
)Completed
, you can deploy your model.[WORK]/[RESULT]
structure as default behavior