Create Response

Authorizations

Authorization

string

header

required

Bearer authentication using your Fireworks API key. Format: Bearer <API_KEY>

Body

application/json

Request model for creating a new response.

This model defines all the parameters needed to create a new model response, including model configuration, input data, tool definitions, and conversation continuation.

model

string

required

The model to use for generating the response. Example: accounts/<ACCOUNT_ID>/models/<MODEL_ID>.

input

required

The input to the model. Can be a simple text string or a list of message objects for complex inputs with multiple content types.

previous_response_id

string | null

The ID of a previous response to continue the conversation from. When provided, the conversation history from that response will be automatically loaded.

instructions

string | null

System instructions that guide the model's behavior throughout the conversation. Similar to a system message.

max_output_tokens

integer | null

The maximum number of tokens that can be generated in the response. Must be at least 1. If not specified, the model will generate up to its maximum context length.

max_tool_calls

integer | null

The maximum number of tool calls allowed in a single response. Useful for controlling costs and limiting tool execution. Must be at least 1.

Required range: x >= 1

metadata

object | null

Set of up to 16 key-value pairs that can be attached to the response. Useful for storing additional information in a structured format.

parallel_tool_calls

boolean | null

default:true

Whether to enable parallel function calling during tool use. When true, the model can call multiple tools simultaneously. Default is True.

reasoning

object | null

Configuration for reasoning output. When enabled, the model will return its reasoning process along with the response.

store

boolean | null

default:true

Whether to store the response. When set to false, the response will not be stored and will not be retrievable via the API. This is useful for ephemeral or sensitive data. See an example in our Controlling Response Storage cookbook. Default is True.

stream

boolean | null

default:false

Whether to stream the response back as Server-Sent Events (SSE). When true, tokens are sent incrementally as they are generated. Default is False.

temperature

number | null

default:1

The sampling temperature to use, between 0 and 2. Higher values like 0.8 make output more random, while lower values like 0.2 make it more focused and deterministic. Default is 1.0.

Required range: 0 <= x <= 2

text

object | null

Text generation configuration parameters. Used for advanced text generation settings.

tool_choice

default:auto

Controls which (if any) tool the model should use. Can be 'none' (never call tools), 'auto' (model decides), 'required' (must call at least one tool), or an object specifying a particular tool to call. Default is 'auto'.

tools

Tools · object[] | null

A list of MCP tools the model may call. See our cookbooks for examples on basic MCP usage and streaming with MCP.

top_p

number | null

default:1

An alternative to temperature sampling, called nucleus sampling, where the model considers the results of tokens with top_p probability mass. So 0.1 means only tokens comprising the top 10% probability mass are considered. Default is 1.0. We generally recommend altering this or temperature but not both.

Required range: 0 <= x <= 1

truncation

string | null

default:disabled

The truncation strategy to use for the context when it exceeds the model's maximum length. Can be 'auto' (automatically truncate) or 'disabled' (return error if context too long). Default is 'disabled'.

user

string | null

A unique identifier representing your end-user, which can help Fireworks to monitor and detect abuse. This can be a username, email, or any other unique identifier.

Response

Successful Response

Represents a response object returned from the API.

A response includes the model output, token usage, configuration parameters, and metadata about the conversation state.

created_at

integer

required

The Unix timestamp (in seconds) when the response was created.

status

string

required

The status of the response. Can be 'completed', 'in_progress', 'incomplete', 'failed', or 'cancelled'.

model

string

required

The model used to generate the response (e.g., accounts/<ACCOUNT_ID>/models/<MODEL_ID>).

output

Output · array

required

An array of output items produced by the model. Can contain messages, tool calls, and tool outputs.

Message
ToolCall
ToolOutput

Show child attributes

string | null

The unique identifier of the response. Will be None if store=False.

object

string

default:response

The object type, which is always 'response'.

previous_response_id

string | null

The ID of the previous response in the conversation, if this response continues a conversation.

usage

object | null

Token usage information for the request. Contains 'prompt_tokens', 'completion_tokens', and 'total_tokens'.

error

object | null

Error information if the response failed. Contains 'type', 'code', and 'message' fields.

incomplete_details

object | null

Details about why the response is incomplete, if status is 'incomplete'. Contains 'reason' field which can be 'max_output_tokens', 'max_tool_calls', or 'content_filter'.

instructions

string | null

System instructions that guide the model's behavior. Similar to a system message.

max_output_tokens

integer | null

The maximum number of tokens that can be generated in the response. Must be at least 1.

max_tool_calls

integer | null

The maximum number of tool calls allowed in a single response. Must be at least 1.

Required range: x >= 1

parallel_tool_calls

boolean

default:true

Whether to enable parallel function calling during tool use. Default is True.

reasoning

object | null

Reasoning output from the model, if reasoning is enabled. Contains 'content' and 'type' fields.

store

boolean | null

default:true

Whether to store this response for future retrieval. If False, the response will not be persisted and previous_response_id cannot reference it. Default is True.

temperature

number

default:1

The sampling temperature to use, between 0 and 2. Higher values like 0.8 make output more random, while lower values like 0.2 make it more focused and deterministic. Default is 1.0.

Required range: 0 <= x <= 2

text

object | null

Text generation configuration parameters, if applicable.

tool_choice

default:auto

Controls which (if any) tool the model should use. Can be 'none', 'auto', 'required', or an object specifying a particular tool. Default is 'auto'.

tools

Tools · object[]

A list of tools the model may call. Each tool is defined with a type and function specification following the OpenAI tool format. Supports 'function', 'mcp', 'sse', and 'python' tool types.

top_p

number

default:1

Required range: 0 <= x <= 1

truncation

string

default:disabled

The truncation strategy to use for the context. Can be 'auto' or 'disabled'. Default is 'disabled'.

user

string | null

A unique identifier representing your end-user, which can help Fireworks to monitor and detect abuse.

metadata

object | null

Set of up to 16 key-value pairs that can be attached to the response. Useful for storing additional information about the response in a structured format.

API Reference

Inference

Deployments

Fine-tuning

Multimedia

Admin

Build SDK

Authorizations

Body

Response