Skip to main content
Fireworks AI Docs home page
Documentation
API & SDK Reference
CLI Reference
Resources
Community
Status
Dashboard
Dashboard
Search...
Navigation
Models & Inference
Does the API support batching and load balancing?
Search...
⌘K
Reference
Concepts
Changelog
OpenAI compatibility
Examples
Featured
Fine-tuning
Reinforcement Learning
FAQ
Account & Access
Billing & Pricing
Deployment & Infrastructure
Models & Inference
Custom base models
Serverless model availability
Model availability requests
API batching & load balancing
Request handling capacity
FLUX image generation
SDXL ControlNet sizing
Models & Inference
Does the API support batching and load balancing?
Copy page
Copy page
Current capabilities include:
Load balancing
: Yes, supported out of the box
Continuous batching
: Yes, supported
Batch inference
: Yes, supported via the
Batch API
Streaming
: Yes, supported
For asynchronous batch processing of large volumes of requests, see our
Batch API documentation
.
Was this page helpful?
Yes
No
There’s a model I would like to use that isn’t available on Fireworks. Can I request it?
Previous
What factors affect the number of simultaneous requests that can be handled?
Next
⌘I