- Load balancing: Yes, supported out of the box
- Continuous batching: Yes, supported
- Batch inference: Yes, supported via the Batch API
- Streaming: Yes, supported
Models & Inference
Does the API support batching and load balancing?
Current capabilities include: