Anthropic compatibility

You can use the Anthropic Python SDK or Anthropic TypeScript SDK to interact with Fireworks, making it easy to migrate applications that already use Anthropic’s Messages API. Fireworks exposes an Anthropic-compatible endpoint at POST /v1/messages.

Quickstart

Install the Anthropic SDK for your language:

Python
JavaScript / TypeScript

pip install anthropic

npm install @anthropic-ai/sdk

Then make your first request:

import os
import anthropic

client = anthropic.Anthropic(
    api_key=os.environ["FIREWORKS_API_KEY"],
    base_url="https://api.fireworks.ai/inference",
)

response = client.messages.create(
    model="accounts/fireworks/models/kimi-k2p5",
    max_tokens=256,
    messages=[
        {"role": "user", "content": "Say hello in Spanish. Reply in one word."}
    ],
)

print(response.content[0].text)

The base URL for the Anthropic SDK is https://api.fireworks.ai/inference (without the /v1 suffix). The SDK appends /v1/messages automatically.

Usage

Use the Anthropic SDK as you normally would. Set model to a Fireworks model resource name, such as accounts/fireworks/models/kimi-k2p5. The Serverless Quickstart includes Anthropic SDK examples for common use cases:

API compatibility

Supported endpoint

Fireworks supports the Anthropic /v1/messages endpoint, including non-streaming and streaming (SSE) responses.

Deployment support

Anthropic compatibility is supported for serverless and on-demand deployments. Requests must go through api.fireworks.ai/inference (direct route endpoints are not supported for this surface).

Differences from Anthropic

The following parameters and fields are handled differently or are not supported:

model: Must be a Fireworks model identifier (for example, accounts/fireworks/models/deepseek-v3p2) instead of an Anthropic model name. See the Fireworks Model Library for available models.
max_tokens: Optional on Fireworks (required on Anthropic).
anthropic-version header: Not required. Fireworks ignores this header.
usage field: Included in both non-streaming and streaming responses. See Token usage for details.
service_tier: Supported. Set service_tier: "priority" to opt into Priority serverless.
inference_geo: Not supported.

Reasoning effort mapping

When you use the thinking parameter with output_config.effort, Anthropic effort values map to Fireworks reasoning_effort:

Anthropic effort	Fireworks mapping
`low`	`low`
`medium`	`medium`
`high`	`high`
`max`	`high`

The adaptive thinking type is not supported yet.

For more details on reasoning, including interleaved thinking with tool use, see the Reasoning guide.

Unsupported features

The following Anthropic features are not available on Fireworks:

Server tools: Server-side tool families (for example, code execution, memory, web fetch, tool search, and web search) are not supported.
Server-tool metadata: Fields such as caller and container are not supported.
Tool schema fields: eager_input_streaming, cache_control, allowed_callers, defer_loading, and input_examples are not supported.
server_tool_use: Not included in usage tracking.
speed: The output_config.speed option is not supported yet.

Fireworks extensions

The following Fireworks-specific extension is available on the Anthropic-compatible endpoint:

raw_output: A request parameter (boolean) that returns low-level details of what the model sees, including formatted prompts and function call data.

Token usage

Token usage (input_tokens and output_tokens) is included in both non-streaming and streaming responses.

Non-streaming

For non-streaming requests, usage is returned on the response object:

response = client.messages.create(
    model="accounts/fireworks/models/kimi-k2p5",
    max_tokens=256,
    messages=[{"role": "user", "content": "Say hello"}],
)

print(f"Input tokens:  {response.usage.input_tokens}")
print(f"Output tokens: {response.usage.output_tokens}")

Streaming

For streaming requests, token usage is included in the final message_delta event:

stream = client.messages.create(
    model="accounts/fireworks/models/kimi-k2p5",
    max_tokens=256,
    messages=[{"role": "user", "content": "Say hello"}],
    stream=True,
)

for event in stream:
    if event.type == "message_delta":
        print(f"Input tokens:  {event.usage.input_tokens}")
        print(f"Output tokens: {event.usage.output_tokens}")

There is only one message_delta event per stream (the last event before message_stop), and it always contains the actual token counts. The message_start event also includes a usage field, but its values are always 0 and should be ignored for metering purposes.

Next steps

Quickstart

Get started with your first API call

Reasoning

Use reasoning with thinking models

API reference

Full Anthropic Messages API reference

Reference

Examples

FAQ

Quickstart

Usage

API compatibility

Supported endpoint

Deployment support

Differences from Anthropic

Reasoning effort mapping

Unsupported features

Fireworks extensions

Token usage

Non-streaming

Streaming

Next steps

Quickstart

Reasoning

API reference

Reference

Examples

FAQ

Documentation Index

​Quickstart

​Usage

​API compatibility

​Supported endpoint

​Deployment support

​Differences from Anthropic

​Reasoning effort mapping

​Unsupported features

​Fireworks extensions

​Token usage

​Non-streaming

​Streaming

​Next steps

Quickstart

Reasoning

API reference

Quickstart

Usage

API compatibility

Supported endpoint

Deployment support

Differences from Anthropic

Reasoning effort mapping

Unsupported features

Fireworks extensions

Token usage

Non-streaming

Streaming

Next steps