Deploy one or multiple LoRA models fine tuned on Fireworks
After fine-tuning your model on Fireworks, deploy it to make it available for inference.
Fine-tuned LoRA models, whether created on the Fireworks platform or imported, can only be deployed to on-demand (dedicated) deployments. Serverless deployment is not supported for LoRA addons.
You can also upload and deploy LoRA models fine-tuned outside of Fireworks. See importing fine-tuned models for details.
Deploy your LoRA fine-tuned model with a single command that delivers performance matching the base model. This streamlined approach, called live merge, eliminates the previous two-step process and provides better performance compared to multi-LoRA deployments.
If you have multiple fine-tuned versions of the same base model (e.g., you’ve fine-tuned the same model for different use cases, applications, or prototyping), you can share a single base model deployment across these LoRA models to achieve higher utilization.
Multi-LoRA deployment comes with performance tradeoffs. We recommend using it only if you need to serve multiple fine-tunes of the same base model and are willing to trade performance for higher deployment utilization.
Deprecation notice: The deployedModel request key for routing to LoRA addons is deprecated and will not be supported for any new deployments. Please migrate to the model field with the <model_name>#<deployment_name> format shown below.
To send inference requests to a specific LoRA addon on a multi-LoRA deployment, set the model field in your request payload to <model_name>#<deployment_name>. The # separator tells Fireworks to route the request to the specified LoRA addon loaded on the given deployment.