Skip to main content

Overview

Fireworks exposes the same usage-and-cost data through two equivalent surfaces: Both return the same response shape and accept the same dimensions. Every example below shows the CLI form and the equivalent cURL side-by-side. Pick whichever fits your workflow. The output has two parts:
  • Account costs — rated dollar totals for the range (CLI: prints by default; API: companion GetBillingSummary endpoint).
  • Usage — metered quantities (tokens, accelerator-seconds, audio input seconds) grouped by your chosen dimensions.
This page complements Exporting Billing Metrics: use export-metrics for a raw per-event CSV dump, and the workflows on this page for grouped, rated views.
CLI examples require firectl 1.7.21 or later. Run firectl version, then firectl upgrade if needed.

Authentication

For the API, send your Fireworks API key as a bearer token. Any key on the target account works.
export ACCOUNT_ID="<your-account-slug>"
export FIREWORKS_API_KEY="fw_..."
For the CLI, run firectl login once and firectl reads credentials from ~/.fireworks/auth.ini.

Basic usage

Get a 30-day account-wide breakdown (defaults to all usage types, grouped by model for serverless and by deployment + accelerator for dedicated):
firectl billing get-usage \
  --start-time 2026-05-01 \
  --end-time   2026-06-01
Add -o json for machine-readable output.

Examples

Serverless usage by model

firectl billing get-usage \
  --start-time 2026-05-01 --end-time 2026-06-01 \
  --usage-type serverless \
  --group-by model_name

Serverless usage by API key

Breaks out serverless token consumption per API key. Pass both api_key_id (stable internal ID) and api_key_name (human-readable label from the console / firectl api-key create --name) so the response carries both.
firectl billing get-usage \
  --start-time 2026-05-01 --end-time 2026-06-01 \
  --usage-type serverless \
  --group-by api_key_id \
  --group-by api_key_name \
  --group-by model_name
Sample row from the API response:
{
  "startTime": "2026-05-28T00:00:00Z",
  "endTime":   "2026-05-29T00:00:00Z",
  "promptTokens":     "1842301",
  "completionTokens": "412980",
  "audioInputSeconds": 0,
  "usageType": "TEXT_COMPLETION_INFERENCE_USAGE",
  "group": {
    "api_key_id":   "key_4nMFyHCSZP4CRKqa",
    "api_key_name": "prod-eng",
    "model_name":   "accounts/fireworks/models/kimi-k2.6"
  }
}
Token counts come back as JSON strings (int64 over JSON). Cast them with tonumber in jq or the equivalent in your client before doing arithmetic. The deprecated top-level apiKeyId field is only populated when groupBy=api_key_id is requested — always read API-key values from the group map.

Filter to a specific API key

Repeat --filter (CLI) or filter[<dim>][values]= (API) to OR multiple values for the same dimension.
firectl billing get-usage \
  --start-time 2026-05-01 --end-time 2026-06-01 \
  --usage-type serverless \
  --group-by model_name \
  --filter api_key_name=prod-eng

Dedicated deployment usage by deployment and GPU type

firectl billing get-usage \
  --start-time 2026-05-01 --end-time 2026-06-01 \
  --usage-type dedicated-deployment \
  --group-by deployment_name \
  --group-by accelerator_type

Filter to a single deployment

firectl billing get-usage \
  --start-time 2026-05-01 --end-time 2026-06-01 \
  --filter deployment_name=accounts/my-account/deployments/my-deployment

Account-level cost totals only

firectl billing get-usage \
  --start-time 2026-05-01 --end-time 2026-06-01 \
  --account-costs-only

Reference

CLI flags

FlagDescription
--start-timeStart time (inclusive), as YYYY-MM-DD or 'YYYY-MM-DD hh:mm:ss'.
--end-timeEnd time (exclusive), same formats.
--usage-typeall, serverless, or dedicated-deployment. Defaults to all.
--group-byDimension to group by. Repeatable.
--filterkey=value filter. Repeatable; repeated values for the same key are OR’ed.
--timezoneIANA timezone for daily aggregation (e.g. America/Los_Angeles). Defaults to UTC.
--account-costs-onlyPrint only account-level cumulative costs for the range.
-o, --outputtext (default) or json.
Run firectl billing get-usage --help for the full list.

API parameters

The same dimensions are passed as groupBy=<dim> (repeat for multiple) and filter[<dim>][values]=<value> (repeat for OR). usageType takes SERVERLESS, DEDICATED_DEPLOYMENT, or omitted for all. timezone and startTime/endTime mirror the CLI flags. See the full API reference for parameter schemas and response types.

Grouping dimensions

Valid --group-by / groupBy and --filter / filter dimensions depend on the usage type:
  • Serverless: model_name, api_key_id, api_key_name, annotations.team, annotations.project, annotations.environment
  • Dedicated deployment: deployment_name, accelerator_type, annotations.team, annotations.project, annotations.environment
Dedicated-deployment rows also include the deployment’s region (placement, e.g. US, EUROPE, GLOBAL) and metered accelerator_seconds.

Custom tags (team / project / environment)

Group by annotations.team, annotations.project, or annotations.environment to split usage by your own labels. The tag source depends on usage type:
  • Dedicated deployments: set an annotations map on the deployment, e.g. {"team": "search", "project": "x", "environment": "prod"}.
  • Serverless: send a per-request header on inference calls:
    POST /inference/v1/chat/completions HTTP/1.1
    Host: api.fireworks.ai
    Authorization: Bearer fw_...
    Fireworks-Annotations: team=search,project=ranker,environment=prod
    Content-Type: application/json
    
    Annotation values are validated server-side; unrecognized keys are dropped silently.

Cookbook: per-API-key reporting recipes

These recipes target the HTTP API, where downstream aggregation in jq (or any client) is easiest.

Aggregate per key, across models

Sums prompt and completion tokens for each API key across every model it called, sorted by prompt volume.
curl -sG "https://api.fireworks.ai/v1/accounts/${ACCOUNT_ID}/billingUsage" \
  -H "Authorization: Bearer ${FIREWORKS_API_KEY}" \
  --data-urlencode "startTime=2026-05-01T00:00:00Z" \
  --data-urlencode "endTime=2026-06-01T00:00:00Z" \
  --data-urlencode "usageType=SERVERLESS" \
  --data-urlencode "groupBy=api_key_id" \
  --data-urlencode "groupBy=api_key_name" \
  --data-urlencode "groupBy=model_name" \
  | jq '.serverlessCosts
        | group_by(.group.api_key_id)
        | map({
            api_key_id:        .[0].group.api_key_id,
            api_key_name:      .[0].group.api_key_name,
            models:            (map(.group.model_name) | unique),
            prompt_tokens:     ([.[].promptTokens     | tonumber] | add),
            completion_tokens: ([.[].completionTokens | tonumber] | add)
          })
        | sort_by(-.prompt_tokens)'

Group by model, then by key (cost-by-tool view)

If reporting starts from “how much did each model cost me, and which keys drove that”, flip the nesting:
curl -sG "https://api.fireworks.ai/v1/accounts/${ACCOUNT_ID}/billingUsage" \
  -H "Authorization: Bearer ${FIREWORKS_API_KEY}" \
  --data-urlencode "startTime=2026-05-01T00:00:00Z" \
  --data-urlencode "endTime=2026-06-01T00:00:00Z" \
  --data-urlencode "usageType=SERVERLESS" \
  --data-urlencode "groupBy=api_key_id" \
  --data-urlencode "groupBy=api_key_name" \
  --data-urlencode "groupBy=model_name" \
  | jq '.serverlessCosts
        | group_by(.group.model_name)
        | map({
            model: .[0].group.model_name,
            api_keys: (
              group_by(.group.api_key_id)
              | map({
                  api_key_id:        .[0].group.api_key_id,
                  api_key_name:      .[0].group.api_key_name,
                  prompt_tokens:     ([.[].promptTokens     | tonumber] | add),
                  completion_tokens: ([.[].completionTokens | tonumber] | add)
                })
              | sort_by(-.prompt_tokens)
            )
          })
        | sort_by(.model)'
Multiply the token totals by the published serverless prices to convert to dollars for chargeback.

Backfill more than 31 days

The endpoint caps each request at a 31-day window. To pull a longer history, loop month-by-month:
start_date="2026-01-01"
end_date="2026-06-01"
current="$start_date"

while [ "$(date -u -d "$current" '+%s')" -lt "$(date -u -d "$end_date" '+%s')" ]; do
  next="$(date -u -d "$current +30 days" '+%Y-%m-%d')"
  if [ "$(date -u -d "$next" '+%s')" -gt "$(date -u -d "$end_date" '+%s')" ]; then
    next="$end_date"
  fi

  curl -sG "https://api.fireworks.ai/v1/accounts/${ACCOUNT_ID}/billingUsage" \
    -H "Authorization: Bearer ${FIREWORKS_API_KEY}" \
    --data-urlencode "startTime=${current}T00:00:00Z" \
    --data-urlencode "endTime=${next}T00:00:00Z" \
    --data-urlencode "usageType=SERVERLESS" \
    --data-urlencode "groupBy=api_key_id" \
    --data-urlencode "groupBy=api_key_name" \
    > "usage_${current}_to_${next}.json"

  current="$next"
done

Granularity and freshness

  • Usage is aggregated into daily buckets (--timezone / timezone= sets the day boundary). There are no sub-daily buckets.
  • Responses are cached for several minutes — fine for cron jobs and dashboards, not for real-time.

Coverage caveats

  • Tokens, not dollars. The endpoint returns metered quantities (promptTokens, completionTokens, accelerator_seconds, audioInputSeconds). Multiply by the serverless prices for cost, or use --account-costs-only for account-level dollar totals.
  • Inference types covered today: text completion / chat completion and audio inference. Embeddings and image generation aren’t yet reflected in billingUsage responses; coverage will expand in subsequent releases.
  • Dedicated deployments are attributed at the deployment level, not by API key. Use usageType=DEDICATED_DEPLOYMENT with groupBy=deployment_name for that breakdown.
Run firectl billing get-usage --help to see all available CLI flags and options.

See also