Overview
Fireworks exposes the same usage-and-cost data through two equivalent surfaces:- CLI —
firectl billing get-usage, best for ad-hoc queries, shell scripting, and one-off cost reviews. - HTTP API —
GET /v1/accounts/{account_id}/billingUsage, best for cron jobs, dashboards, and downstream cost-attribution pipelines.
- Account costs — rated dollar totals for the range (CLI: prints by default; API: companion
GetBillingSummaryendpoint). - Usage — metered quantities (tokens, accelerator-seconds, audio input seconds) grouped by your chosen dimensions.
export-metrics for a raw per-event CSV dump, and the workflows on this page for grouped, rated views.
CLI examples require
firectl 1.7.21 or later. Run firectl version, then firectl upgrade if needed.Authentication
For the API, send your Fireworks API key as a bearer token. Any key on the target account works.firectl login once and firectl reads credentials from ~/.fireworks/auth.ini.
Basic usage
Get a 30-day account-wide breakdown (defaults to all usage types, grouped by model for serverless and by deployment + accelerator for dedicated):- firectl
- cURL
-o json for machine-readable output.Examples
Serverless usage by model
- firectl
- cURL
Serverless usage by API key
Breaks out serverless token consumption per API key. Pass bothapi_key_id (stable internal ID) and api_key_name (human-readable label from the console / firectl api-key create --name) so the response carries both.
- firectl
- cURL
Token counts come back as JSON strings (int64 over JSON). Cast them with
tonumber in jq or the equivalent in your client before doing arithmetic. The deprecated top-level apiKeyId field is only populated when groupBy=api_key_id is requested — always read API-key values from the group map.Filter to a specific API key
Repeat--filter (CLI) or filter[<dim>][values]= (API) to OR multiple values for the same dimension.
- firectl
- cURL
Dedicated deployment usage by deployment and GPU type
- firectl
- cURL
Filter to a single deployment
- firectl
- cURL
Account-level cost totals only
- firectl
- cURL
Reference
CLI flags
| Flag | Description |
|---|---|
--start-time | Start time (inclusive), as YYYY-MM-DD or 'YYYY-MM-DD hh:mm:ss'. |
--end-time | End time (exclusive), same formats. |
--usage-type | all, serverless, or dedicated-deployment. Defaults to all. |
--group-by | Dimension to group by. Repeatable. |
--filter | key=value filter. Repeatable; repeated values for the same key are OR’ed. |
--timezone | IANA timezone for daily aggregation (e.g. America/Los_Angeles). Defaults to UTC. |
--account-costs-only | Print only account-level cumulative costs for the range. |
-o, --output | text (default) or json. |
firectl billing get-usage --help for the full list.
API parameters
The same dimensions are passed asgroupBy=<dim> (repeat for multiple) and filter[<dim>][values]=<value> (repeat for OR). usageType takes SERVERLESS, DEDICATED_DEPLOYMENT, or omitted for all. timezone and startTime/endTime mirror the CLI flags. See the full API reference for parameter schemas and response types.
Grouping dimensions
Valid--group-by / groupBy and --filter / filter dimensions depend on the usage type:
- Serverless:
model_name,api_key_id,api_key_name,annotations.team,annotations.project,annotations.environment - Dedicated deployment:
deployment_name,accelerator_type,annotations.team,annotations.project,annotations.environment
placement, e.g. US, EUROPE, GLOBAL) and metered accelerator_seconds.
Custom tags (team / project / environment)
Group byannotations.team, annotations.project, or annotations.environment to split usage by your own labels. The tag source depends on usage type:
-
Dedicated deployments: set an
annotationsmap on the deployment, e.g.{"team": "search", "project": "x", "environment": "prod"}. -
Serverless: send a per-request header on inference calls:
Annotation values are validated server-side; unrecognized keys are dropped silently.
Cookbook: per-API-key reporting recipes
These recipes target the HTTP API, where downstream aggregation injq (or any client) is easiest.
Aggregate per key, across models
Sums prompt and completion tokens for each API key across every model it called, sorted by prompt volume.Group by model, then by key (cost-by-tool view)
If reporting starts from “how much did each model cost me, and which keys drove that”, flip the nesting:Backfill more than 31 days
The endpoint caps each request at a 31-day window. To pull a longer history, loop month-by-month:Granularity and freshness
- Usage is aggregated into daily buckets (
--timezone/timezone=sets the day boundary). There are no sub-daily buckets. - Responses are cached for several minutes — fine for cron jobs and dashboards, not for real-time.
Coverage caveats
- Tokens, not dollars. The endpoint returns metered quantities (
promptTokens,completionTokens,accelerator_seconds,audioInputSeconds). Multiply by the serverless prices for cost, or use--account-costs-onlyfor account-level dollar totals. - Inference types covered today: text completion / chat completion and audio inference. Embeddings and image generation aren’t yet reflected in
billingUsageresponses; coverage will expand in subsequent releases. - Dedicated deployments are attributed at the deployment level, not by API key. Use
usageType=DEDICATED_DEPLOYMENTwithgroupBy=deployment_namefor that breakdown.
See also
firectl billing get-usage- CLI command referenceGET /v1/accounts/{account_id}/billingUsage- HTTP API reference- Exporting Billing Metrics - Raw per-event billing CSV export
- Account quotas - Spending tiers and budget controls