Skip to main content

Documentation Index

Fetch the complete documentation index at: https://fireworks.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

Serverless is the fastest way to get started with using open models. This quickstart will help you make your first API call in minutes.

Step 1: Create and export an API key

Before you begin, create an API key in the Fireworks dashboard. Click Create API key and store it in a safe location. Once you have your API key, export it as an environment variable in your terminal:
export FIREWORKS_API_KEY="your_api_key_here"

Step 2: Make your first Serverless API call

Install the Fireworks Python SDK:
The SDK is currently in alpha. Use the --pre flag when installing to get the latest version.
pip install --pre fireworks-ai
Then make your first Serverless API call:
from fireworks import Fireworks

client = Fireworks()

response = client.chat.completions.create(
  model="accounts/fireworks/models/deepseek-v3p1",
  messages=[{
    "role": "user",
    "content": "Say hello in Spanish",
  }],
)

print(response.choices[0].message.content)
You should see a response like: "¡Hola!"
For Priority Tier (service_tier: "priority") and Fast mode, see Serverless Priority and Fast.

Common use cases

Streaming responses

Stream responses token-by-token for a better user experience:
from fireworks import Fireworks

client = Fireworks()

stream = client.chat.completions.create(
  model="accounts/fireworks/models/deepseek-v3p1",
  messages=[{"role": "user", "content": "Tell me a short story"}],
  stream=True
)

for chunk in stream:
  if chunk.choices[0].delta.content:
    print(chunk.choices[0].delta.content, end="")

Function calling

Connect your models to external tools and APIs:
from fireworks import Fireworks

client = Fireworks()

response = client.chat.completions.create(
    model="accounts/fireworks/models/kimi-k2-instruct-0905",
    messages=[
        {"role": "user", "content": "What's the weather in Paris?"}
    ],
    tools=[
        {
            "type": "function",
            "function": {
                "name": "get_weather",
                "description": "Get the current weather for a location",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "City name, e.g. San Francisco",
                        }
                    },
                    "required": ["location"],
                },
            },
        },
    ],
)

print(response.choices[0].message.tool_calls)
Learn more about function calling →

Structured outputs (JSON mode)

Get reliable JSON responses that match your schema:
from fireworks import Fireworks

client = Fireworks()

response = client.chat.completions.create(
  model="accounts/fireworks/models/deepseek-v3p1",
  messages=[
    {
      "role": "user",
      "content": "Extract the name and age from: John is 30 years old",
    }
  ],
  response_format={
    "type": "json_schema",
    "json_schema": {
      "name": "person",
      "schema": {
        "type": "object",
        "properties": {
          "name": { "type": "string" },
          "age": { "type": "number" }
        },
        "required": ["name", "age"],
      },
    },
  },
)

print(response.choices[0].message.content)
Learn more about structured outputs →

Reasoning

Some models support reasoning, where the model shows its thought process before giving the final answer:
from fireworks import Fireworks

client = Fireworks()

response = client.chat.completions.create(
    model="accounts/fireworks/models/deepseek-v3p2",
    messages=[
        {"role": "user", "content": "What is 25 * 37? Show your work."}
    ],
    reasoning_effort="medium",
)

msg = response.choices[0].message
if msg.reasoning_content:
    print("Reasoning:", msg.reasoning_content)
print("Answer:", msg.content)
Learn more about reasoning →

Vision models

Analyze images with vision-language models:
from fireworks import Fireworks

client = Fireworks()

response = client.chat.completions.create(
  model="accounts/fireworks/models/qwen2p5-vl-32b-instruct",
  messages=[
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "What's in this image?"},
        {
          "type": "image_url",
          "image_url": {
            "url": "https://storage.googleapis.com/fireworks-public/image_assets/fireworks-ai-wordmark-color-dark.png"
          },
        },
      ],
    }
  ],
)

print(response.choices[0].message.content)
Learn more about vision models →

Learn more about Serverless

For the model lifecycle policy, billing details, and serverless-specific request/response behavior, see the Serverless overview.

Next steps

Ready to scale to production, explore other modalities, or customize your models?

Deploy and autoscale on Dedicated GPUs

Deploy with high performance on dedicated GPUs with fast autoscaling and minimal cold starts

Fine-tune Models

Improve model quality with supervised and reinforcement learning

Embeddings & Reranking

Use embeddings & reranking in search & context retrieval

Batch Inference

Run async inference jobs at scale, faster and cheaper

Browse 100+ Models

Explore all available models across modalities

API Reference

Complete API documentation