Skip to main content
Current capabilities include:
  • Load balancing: Yes, supported out of the box
  • Continuous batching: Yes, supported
  • Batch inference: Yes, supported via the Batch API
  • Streaming: Yes, supported
For asynchronous batch processing of large volumes of requests, see our Batch API documentation.