Overview
The Chat Completions API allows synchronous inference for a single request. However, if you need to process a large number of requests, our Batch Inference API is a more efficient alternative. Our API works for all 1000+ models in our Model Library, as well as your own fine-tuned models.Use Cases
- ETL Pipelines - Construct production pipelines around large scale inference workloads
- Evaluations – Automate large-scale testing and benchmarking
- Distillation – Teach a smaller model using a larger model
Cost Optimization
Batch API Advantages
- 💸 Volume Discounts
- ⚡ Higher throughput – Process more data in less time.
Step-by-Step Guide to Batch Inference with Fireworks AI
1. Preparing the Dataset
Datasets must adhere strictly to the JSONL format, where each line represents a complete JSON-formatted inference request. Requirements:- File format: JSONL (each line is a valid JSON object)
- Total size limit: Under 500MB
- Format: OpenAI Batch API compatible format with
custom_id
(unique id) andbody
fields
batch_input_data.jsonl
, making sure custom_id
is unique across rows.
2. Uploading the Dataset to Fireworks AI
There are a few ways to upload the dataset to Fireworks platform for batch inference:UI
, firectl
or HTTP API
.
You can simply navigate to the dataset tab, click 
Create Dataset
and follow the wizard.
3. Creating a Batch Inference Job
Navigate to the Batch Inference tab and click “Create Batch Inference Job”. Select your input dataset:
Choose your model:
Configure optional settings:



4. Monitoring and Managing Batch Inference Jobs
Batch Job States
Batch Inference Jobs progress through several states during their lifecycle:State | Description |
---|---|
VALIDATING | The input dataset is being validated to ensure it meets format requirements and constraints |
PENDING | The job is queued and waiting for available resources to begin processing |
RUNNING | The batch job is actively processing requests from the input dataset |
COMPLETED | All requests have been successfully processed and results are available in the output dataset |
FAILED | The job encountered an unrecoverable error. Check the job status message for details |
EXPIRED | The job exceeded the 24-hour time limit. Any completed requests up to that point are saved to the output dataset |
View all your batch inference jobs in the dashboard:

5. Downloading the Results
After the batch inference job is complete, download the output dataset containing the results.Navigate to the output dataset and download the results:

Output Files
The output dataset contains two types of files:File Type | Description |
---|---|
Results File | Contains successful inference responses in JSONL format, with each line matching the custom_id from your input |
Error File | Contains any error details for requests that failed processing, and the original custom_id for debugging |
6. Best Practices and Considerations
- Validate your dataset thoroughly before uploading.
- Use appropriate inference parameters for your use case.
- Monitor job progress for long-running batches.
- Set reasonable
max_tokens
limits to optimize processing time. - Use descriptive
custom_id
values for easier result tracking.
Model Availability
- Base Models – Any Base Model in our Model Library
- Account Models – Any model you have uploaded/trained, including fine-tuned models
Limits
- Each individual request (row in the dataset) will follow the same constraints as Chat Completion Limits
- The Input Dataset must adhere to Dataset Limits and be under 500Mb total.
- The Output Dataset will be capped at 8GB, and the job may expire early if the limit is reached.
Batch Expiration
A Batch Job will expire if it runs for 24 hours, and any completed rows will be billed for and written to the output dataset.Appendix
Python builder SDK
references
HTTP API
references
firectl
references