Use this file to discover all available pages before exploring further.
On-demand deployments are dedicated GPUs that give you better performance, no rate limits, fast autoscaling, and a wider selection of models than serverless. This quickstart will help you spin up your first on-demand deployment in minutes.
Before you begin, create an API key in the Fireworks dashboard. Click Create API key and store it in a safe location.Once you have your API key, export it as an environment variable in your terminal:
This command will create a deployment of GPT OSS 120B optimized for speed. It will take a few minutes to complete. The resulting deployment will scale up to 1 replica.
fast is called a deployment shape, which is a pre-configured deployment template created by the Fireworks team that sets sensible defaults for most deployment options (such as hardware type).You can also pass throughput or cost to --deployment-shape:
throughput creates a deployment that trades off latency for lower cost-per-token at scale
cost creates a deployment that trades off latency and throughput for lowest cost-per-token at small scale, usually for early experimentation and prototyping
While we recommend using a deployment shape, you are also free to pass your own configuration to the deployment via our deployment guide.
Now you can query your on-demand deployment using the same API as serverless models, but using your dedicated deployment. Replace <DEPLOYMENT_NAME> in the below snippets with the value from the Name: field in the previous step:
The examples from the Serverless quickstart will work with this deployment as well, just replace the model string with the deployment-specific model string from above.Serverless quickstart→