If you have multiple fine-tuned versions of the same base model (e.g. you’ve finetuned the same model for different use cases, applications, or prototyping/experimentation), it is possible to share a single base model deployment across these LoRA models to achieve higher utilization. We call this feature Multi-LoRA and it serves as an alternative to the deployment pattern we used in deploying a fine-tuned model using an on-demand deployment, where we had a single deployment serving a single LoRA model. Using Multi-LoRA comes with performance tradeoffs, so we recommend only using Multi-LoRA if you need to serve multiple fine-tunes of the same base model and are willing to trade off performance for higher deployment utilization. To use Multi-LoRA, first create a deployment of your base model and pass the –enable-addons flag
firectl create deployment "accounts/fireworks/models/<MODEL_ID of base model>" --enable-addons
Then, when the deployment is ready, deploy the LoRA but provide the deployment ID of this deployment
firectl load-lora <FINE_TUNED_MODEL_ID> --deployment <DEPLOYMENT_ID>
You can deploy several LoRA models onto the same deployment this way.

Using Multi-LoRA with the Build SDK

You can also use multi-LoRA deployment with the Build SDK:
from fireworks import LLM

# Create a base model deployment with addons enabled
base_model = LLM(
    model="accounts/fireworks/models/base-model-id",
    deployment_type="on-demand",
    id="shared-base-deployment",  # Simple string identifier
    enable_addons=True
)
base_model.apply()

# Deploy multiple fine-tuned models using the same base deployment
fine_tuned_model_1 = LLM(
    model="accounts/your-account/models/fine-tuned-model-1",
    deployment_type="on-demand-lora",
    base_id=base_model.deployment_id
)

fine_tuned_model_2 = LLM(
    model="accounts/your-account/models/fine-tuned-model-2", 
    deployment_type="on-demand-lora",
    base_id=base_model.deployment_id
)

# Apply deployments
fine_tuned_model_1.apply()
fine_tuned_model_2.apply()

# Use the deployed models
response_1 = fine_tuned_model_1.chat.completions.create(
    messages=[{"role": "user", "content": "Hello from model 1!"}]
)

response_2 = fine_tuned_model_2.chat.completions.create(
    messages=[{"role": "user", "content": "Hello from model 2!"}]
)
When using deployment_type="on-demand-lora", you need to provide the base_id parameter that references the deployment ID of your base model deployment.