Skip to main content
Fireworks AI Docs home page
Documentation
API & SDK Reference
CLI Reference
Resources
Community
Status
Dashboard
Dashboard
Search...
Navigation
Models & Inference
What factors affect the number of simultaneous requests that can be handled?
Search...
⌘K
Reference
Concepts
Changelog
OpenAI compatibility
Examples
Featured
Fine-tuning
Reinforcement Learning
FAQ
Account & Access
Billing & Pricing
Deployment & Infrastructure
Models & Inference
Custom base models
Serverless model availability
Model availability requests
API batching & load balancing
Request handling capacity
FLUX image generation
SDXL ControlNet sizing
Models & Inference
What factors affect the number of simultaneous requests that can be handled?
Copy page
Copy page
Request handling capacity depends on several factors:
Model size and type
Number of GPUs allocated
to the deployment
GPU type
(e.g., A100, H100)
Prompt size
Generation token length
Deployment type
(serverless vs. on-demand)
Was this page helpful?
Yes
No
Does the API support batching and load balancing?
Previous
FLUX image generation
Next
⌘I