Authorizations
Bearer authentication using your Fireworks API key. Format: Bearer <API_KEY>
Body
Request model for creating a new response.
This model defines all the parameters needed to create a new model response, including model configuration, input data, tool definitions, and conversation continuation.
The model to use for generating the response. Example: accounts/<ACCOUNT_ID>/models/<MODEL_ID>.
The input to the model. Can be a simple text string or a list of message objects for complex inputs with multiple content types.
The ID of a previous response to continue the conversation from. When provided, the conversation history from that response will be automatically loaded.
System instructions that guide the model's behavior throughout the conversation. Similar to a system message.
The maximum number of tokens that can be generated in the response. Must be at least 1. If not specified, the model will generate up to its maximum context length.
The maximum number of tool calls allowed in a single response. Useful for controlling costs and limiting tool execution. Must be at least 1.
x >= 1Set of up to 16 key-value pairs that can be attached to the response. Useful for storing additional information in a structured format.
Whether to enable parallel function calling during tool use. When true, the model can call multiple tools simultaneously. Default is True.
Configuration for reasoning output. When enabled, the model will return its reasoning process along with the response.
Whether to store the response. When set to false, the response will not be stored and will not be retrievable via the API. This is useful for ephemeral or sensitive data. See an example in our Controlling Response Storage cookbook. Default is True.
Whether to stream the response back as Server-Sent Events (SSE). When true, tokens are sent incrementally as they are generated. Default is False.
The sampling temperature to use, between 0 and 2. Higher values like 0.8 make output more random, while lower values like 0.2 make it more focused and deterministic. Default is 1.0.
0 <= x <= 2Text generation configuration parameters. Used for advanced text generation settings.
Controls which (if any) tool the model should use. Can be 'none' (never call tools), 'auto' (model decides), 'required' (must call at least one tool), or an object specifying a particular tool to call. Default is 'auto'.
A list of MCP tools the model may call. See our cookbooks for examples on basic MCP usage and streaming with MCP.
An alternative to temperature sampling, called nucleus sampling, where the model considers the results of tokens with top_p probability mass. So 0.1 means only tokens comprising the top 10% probability mass are considered. Default is 1.0. We generally recommend altering this or temperature but not both.
0 <= x <= 1The truncation strategy to use for the context when it exceeds the model's maximum length. Can be 'auto' (automatically truncate) or 'disabled' (return error if context too long). Default is 'disabled'.
A unique identifier representing your end-user, which can help Fireworks to monitor and detect abuse. This can be a username, email, or any other unique identifier.
Response
Successful Response
Represents a response object returned from the API.
A response includes the model output, token usage, configuration parameters, and metadata about the conversation state.
The Unix timestamp (in seconds) when the response was created.
The status of the response. Can be 'completed', 'in_progress', 'incomplete', 'failed', or 'cancelled'.
The model used to generate the response (e.g., accounts/<ACCOUNT_ID>/models/<MODEL_ID>).
An array of output items produced by the model. Can contain messages, tool calls, and tool outputs.
- Message
- ToolCall
- ToolOutput
The unique identifier of the response. Will be None if store=False.
The object type, which is always 'response'.
The ID of the previous response in the conversation, if this response continues a conversation.
Token usage information for the request. Contains 'prompt_tokens', 'completion_tokens', and 'total_tokens'.
Error information if the response failed. Contains 'type', 'code', and 'message' fields.
Details about why the response is incomplete, if status is 'incomplete'. Contains 'reason' field which can be 'max_output_tokens', 'max_tool_calls', or 'content_filter'.
System instructions that guide the model's behavior. Similar to a system message.
The maximum number of tokens that can be generated in the response. Must be at least 1.
The maximum number of tool calls allowed in a single response. Must be at least 1.
x >= 1Whether to enable parallel function calling during tool use. Default is True.
Reasoning output from the model, if reasoning is enabled. Contains 'content' and 'type' fields.
Whether to store this response for future retrieval. If False, the response will not be persisted and previous_response_id cannot reference it. Default is True.
The sampling temperature to use, between 0 and 2. Higher values like 0.8 make output more random, while lower values like 0.2 make it more focused and deterministic. Default is 1.0.
0 <= x <= 2Text generation configuration parameters, if applicable.
Controls which (if any) tool the model should use. Can be 'none', 'auto', 'required', or an object specifying a particular tool. Default is 'auto'.
A list of tools the model may call. Each tool is defined with a type and function specification following the OpenAI tool format. Supports 'function', 'mcp', 'sse', and 'python' tool types.
An alternative to temperature sampling, called nucleus sampling, where the model considers the results of tokens with top_p probability mass. So 0.1 means only tokens comprising the top 10% probability mass are considered. Default is 1.0.
0 <= x <= 1The truncation strategy to use for the context. Can be 'auto' or 'disabled'. Default is 'disabled'.
A unique identifier representing your end-user, which can help Fireworks to monitor and detect abuse.
Set of up to 16 key-value pairs that can be attached to the response. Useful for storing additional information about the response in a structured format.