asyncio
library. It also includes retry logic for handling 429
errors that Fireworks returns when the server is overloaded. We have run
benchmarks that demonstrate the performance benefits.
General optimization recommendations
Based on our benchmarks, we recommend the following:- Use a client library optimized for high concurrency, such as aiohttp in Python or http.Agent in Node.js.
- Keep the
connection pool size
high (1000+). - Increase concurrency until performance stops improving or you observe too many
429
errors. - Use direct routing to avoid the global API load balancer and route requests directly to your deployment.
Code example: Optimal concurrent requests (Python)
Here’s how to implement optimal concurrent requests usingasyncio
and the LLM
class:
main.py
- Uses
asyncio.Semaphore
to control concurrency to avoid overwhelming the server - Allows configuration of the maximum number of concurrent connections to the Fireworks API