How DPO Works
DPO utilizes pairwise comparisons to refine the model’s behavior. Essentially, for a given prompt, the model is presented with two responses: one that is considered the “preferred” or positive example, and another that is labeled as “non-preferred” or negative. The model is then trained to increase the probability of generating the preferred response and decrease the probability of generating the non-preferred response. This process effectively teaches the model to replicate the preference patterns observed in the provided comparison data.Step-by-Step Guide to Fine-Tuning with Fireworks AI
1
Prepare dataset
Datasets must adhere strictly to the JSONL format, where each line represents a complete JSON-formatted training example.Minimum Requirements:Save this dataset as jsonl file locally, for example
- Minimum examples needed: 3
- Maximum examples: Up to 3 million examples per dataset
- File format: JSONL (each line is a valid JSON object)
- Dataset Schema: Each training sample must include the following fields:
- An
input
field containing amessages
array, where each message is an object with two fields:role
: one ofsystem
,user
, orassistant
content
: a string representing the message content
- A
preferred_output
field containing an assistant message with an ideal response - A
non_preferred_output
field containing an assistant message with a suboptimal response
- An
einstein_dpo.jsonl
We currently only support one-turn conversations for each example, where the preferred and non-preferred messages need to be the last assistant message.
einstein_dpo.jsonl
.2
Create and upload the dataset
There are a couple ways to upload the dataset to Fireworks platform for fine tuning: While all of the above approaches should work,
firectl
, Restful API
, builder SDK
or UI
.-
You can simply navigate to the dataset tab, click
Create Dataset
and follow the wizard.
UI
is more suitable for smaller datasets < 500MB
while firectl
might work better for bigger datasets.Ensure the dataset ID conforms to the resource id restrictions.3
Create a DPO Job
Simple use for our example, we might run the following command:to fine-tune a Llama 3.1 8b Instruct model with our Einstein dataset.
firectl
to create a new DPO job:4
Monitor the DPO Job
Use Once the job is complete, the
firectl
to monitor progress updates for the DPO fine-tuning job.STATE
will be set to JOB_STATE_COMPLETED
, and the fine-tuned model can be deployed.5
Deploy the DPO fine-tuned model
Once training completes, you can create a deployment to interact with the fine-tuned model. Refer to deploying a fine-tuned model for more details.
Next Steps
Fireworks AI provides multiple options for fine-tuning models. Explore other fine-tuning methods to improve model output.Appendix
Python builder SDK
references
Restful API
references
firectl
references