Skip to main content
POST
/
v1
/
responses
Create Response
curl --request POST \
  --url https://api.fireworks.ai/inference/v1/responses \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
  "model": "<string>",
  "input": "<string>",
  "previous_response_id": "<string>",
  "instructions": "<string>",
  "max_output_tokens": 123,
  "max_tool_calls": 2,
  "metadata": {},
  "parallel_tool_calls": true,
  "reasoning": {},
  "store": true,
  "stream": true,
  "temperature": 1,
  "text": {},
  "tool_choice": "<string>",
  "tools": [
    {}
  ],
  "top_p": 0.5,
  "truncation": "<string>",
  "user": "<string>"
}'
{
  "id": "<string>",
  "object": "response",
  "created_at": 123,
  "status": "<string>",
  "model": "<string>",
  "output": [
    {
      "id": "<string>",
      "type": "message",
      "role": "<string>",
      "content": [
        {
          "type": "<string>",
          "text": "<string>"
        }
      ],
      "status": "<string>"
    }
  ],
  "previous_response_id": "<string>",
  "usage": {},
  "error": {},
  "incomplete_details": {},
  "instructions": "<string>",
  "max_output_tokens": 123,
  "max_tool_calls": 2,
  "parallel_tool_calls": true,
  "reasoning": {},
  "store": true,
  "temperature": 1,
  "text": {},
  "tool_choice": "<string>",
  "tools": [
    {}
  ],
  "top_p": 1,
  "truncation": "disabled",
  "user": "<string>",
  "metadata": {}
}

Authorizations

Authorization
string
header
required

Bearer authentication using your Fireworks API key. Format: Bearer <API_KEY>

Body

application/json

Request model for creating a new response.

This model defines all the parameters needed to create a new model response, including model configuration, input data, tool definitions, and conversation continuation.

model
string
required

The model to use for generating the response. Example: accounts/<ACCOUNT_ID>/models/<MODEL_ID>.

input
required

The input to the model. Can be a simple text string or a list of message objects for complex inputs with multiple content types.

previous_response_id
string | null

The ID of a previous response to continue the conversation from. When provided, the conversation history from that response will be automatically loaded.

instructions
string | null

System instructions that guide the model's behavior throughout the conversation. Similar to a system message.

max_output_tokens
integer | null

The maximum number of tokens that can be generated in the response. Must be at least 1. If not specified, the model will generate up to its maximum context length.

max_tool_calls
integer | null

The maximum number of tool calls allowed in a single response. Useful for controlling costs and limiting tool execution. Must be at least 1.

Required range: x >= 1
metadata
object | null

Set of up to 16 key-value pairs that can be attached to the response. Useful for storing additional information in a structured format.

parallel_tool_calls
boolean | null
default:true

Whether to enable parallel function calling during tool use. When true, the model can call multiple tools simultaneously. Default is True.

reasoning
object | null

Configuration for reasoning output. When enabled, the model will return its reasoning process along with the response.

store
boolean | null
default:true

Whether to store the response. When set to false, the response will not be stored and will not be retrievable via the API. This is useful for ephemeral or sensitive data. See an example in our Controlling Response Storage cookbook. Default is True.

stream
boolean | null
default:false

Whether to stream the response back as Server-Sent Events (SSE). When true, tokens are sent incrementally as they are generated. Default is False.

temperature
number | null
default:1

The sampling temperature to use, between 0 and 2. Higher values like 0.8 make output more random, while lower values like 0.2 make it more focused and deterministic. Default is 1.0.

Required range: 0 <= x <= 2
text
object | null

Text generation configuration parameters. Used for advanced text generation settings.

tool_choice
default:auto

Controls which (if any) tool the model should use. Can be 'none' (never call tools), 'auto' (model decides), 'required' (must call at least one tool), or an object specifying a particular tool to call. Default is 'auto'.

tools
Tools · object[] | null

A list of MCP tools the model may call. See our cookbooks for examples on basic MCP usage and streaming with MCP.

top_p
number | null
default:1

An alternative to temperature sampling, called nucleus sampling, where the model considers the results of tokens with top_p probability mass. So 0.1 means only tokens comprising the top 10% probability mass are considered. Default is 1.0. We generally recommend altering this or temperature but not both.

Required range: 0 <= x <= 1
truncation
string | null
default:disabled

The truncation strategy to use for the context when it exceeds the model's maximum length. Can be 'auto' (automatically truncate) or 'disabled' (return error if context too long). Default is 'disabled'.

user
string | null

A unique identifier representing your end-user, which can help Fireworks to monitor and detect abuse. This can be a username, email, or any other unique identifier.

Response

Successful Response

Represents a response object returned from the API.

A response includes the model output, token usage, configuration parameters, and metadata about the conversation state.

created_at
integer
required

The Unix timestamp (in seconds) when the response was created.

status
string
required

The status of the response. Can be 'completed', 'in_progress', 'incomplete', 'failed', or 'cancelled'.

model
string
required

The model used to generate the response (e.g., accounts/<ACCOUNT_ID>/models/<MODEL_ID>).

output
Output · array
required

An array of output items produced by the model. Can contain messages, tool calls, and tool outputs.

  • Message
  • ToolCall
  • ToolOutput
id
string | null

The unique identifier of the response. Will be None if store=False.

object
string
default:response

The object type, which is always 'response'.

previous_response_id
string | null

The ID of the previous response in the conversation, if this response continues a conversation.

usage
object | null

Token usage information for the request. Contains 'prompt_tokens', 'completion_tokens', and 'total_tokens'.

error
object | null

Error information if the response failed. Contains 'type', 'code', and 'message' fields.

incomplete_details
object | null

Details about why the response is incomplete, if status is 'incomplete'. Contains 'reason' field which can be 'max_output_tokens', 'max_tool_calls', or 'content_filter'.

instructions
string | null

System instructions that guide the model's behavior. Similar to a system message.

max_output_tokens
integer | null

The maximum number of tokens that can be generated in the response. Must be at least 1.

max_tool_calls
integer | null

The maximum number of tool calls allowed in a single response. Must be at least 1.

Required range: x >= 1
parallel_tool_calls
boolean
default:true

Whether to enable parallel function calling during tool use. Default is True.

reasoning
object | null

Reasoning output from the model, if reasoning is enabled. Contains 'content' and 'type' fields.

store
boolean | null
default:true

Whether to store this response for future retrieval. If False, the response will not be persisted and previous_response_id cannot reference it. Default is True.

temperature
number
default:1

The sampling temperature to use, between 0 and 2. Higher values like 0.8 make output more random, while lower values like 0.2 make it more focused and deterministic. Default is 1.0.

Required range: 0 <= x <= 2
text
object | null

Text generation configuration parameters, if applicable.

tool_choice
default:auto

Controls which (if any) tool the model should use. Can be 'none', 'auto', 'required', or an object specifying a particular tool. Default is 'auto'.

tools
Tools · object[]

A list of tools the model may call. Each tool is defined with a type and function specification following the OpenAI tool format. Supports 'function', 'mcp', 'sse', and 'python' tool types.

top_p
number
default:1

An alternative to temperature sampling, called nucleus sampling, where the model considers the results of tokens with top_p probability mass. So 0.1 means only tokens comprising the top 10% probability mass are considered. Default is 1.0.

Required range: 0 <= x <= 1
truncation
string
default:disabled

The truncation strategy to use for the context. Can be 'auto' or 'disabled'. Default is 'disabled'.

user
string | null

A unique identifier representing your end-user, which can help Fireworks to monitor and detect abuse.

metadata
object | null

Set of up to 16 key-value pairs that can be attached to the response. Useful for storing additional information about the response in a structured format.