Reference
Resource types
The SDK currently supports three types of resources: LLM
, Dataset
, and SupervisedFineTuningJob
.
LLM
Properties:
deployment_name
str - The full name of the deployment (e.g.,accounts/my-account/deployments/my-custom-deployment
)deployment_display_name
str - The display name of the deployment, defaults to the filename where the LLM was instantiated unless otherwise specifieddeployment_url
str - The URL to view the deployment in the Fireworks dashboardtemperature
float - The temperature for generationmodel
str - The model associated with this LLM (e.g.,accounts/fireworks/models/llama-v3p2-3b-instruct
)base_deployment_name
str - If a LoRA addon, the deployment name of the base model deploymentpeft_base_model
str - If this is a LoRA addon, the base model identifier (e.g.,accounts/fireworks/models/llama-v3p2-3b-instruct
)addons_enabled
bool - Whether LoRA addons are enabled for this LLMmodel_id
str - The identifier used under the hood to query this model (e.g.,accounts/my-account/deployedModels/my-deployed-model-abcdefg
)
Instantiation
The LLM(*args, **kwargs)
class constructor initializes a new LLM instance.
Required Arguments
model
str - The model identifier to use (e.g.,accounts/fireworks/models/llama-v3p2-3b-instruct
)deployment_type
str - The type of deployment to use. Must be one of:"serverless"
: Uses Fireworks’ shared serverless infrastructure"on-demand"
: Uses dedicated resources for your deployment"auto"
: Automatically selects the most cost-effective option (recommended for experimentation)
Optional Arguments
Deployment Configuration
deployment_name
str, optional - Name to identify the deployment. If not provided, Fireworks will auto-generate one. If a deployment with the same name already exists, the SDK will try and re-use it.deployment_display_name
str, optional - Display name for the deployment. Defaults to the filename where the LLM was instantiated. If a deployment with the same display name and model already exists, the SDK will try and re-use it.base_deployment_name
str, optional - Base deployment name for LoRA addons. If not provided, will try to find a base model deployment that can be reused.
Authentication & API
api_key
str, optional - Your Fireworks API keybase_url
str, optional - Base URL for API calls. Defaults to “https://api.fireworks.ai/inference/v1”max_retries
int, optional - Maximum number of retry attempts. Defaults to 3
Scaling Configuration
scale_up_window
timedelta, optional - Time to wait before scaling up after increased load. Defaults to 1 secondscale_down_window
timedelta, optional - Time to wait before scaling down after decreased load. Defaults to 1 minutescale_to_zero_window
timedelta, optional - Time of inactivity before scaling to zero. Defaults to 5 minutes
Hardware & Performance
accelerator_type
str, optional - Type of GPU accelerator to useregion
str, optional - Region for deploymentmin_replica_count
int, optional - Minimum number of replicasmax_replica_count
int, optional - Maximum number of replicasreplica_count
int, optional - Fixed number of replicasaccelerator_count
int, optional - Number of accelerators per replicaprecision
str, optional - Model precision (e.g., “FP16”, “FP8”)max_batch_size
int, optional - Maximum batch size for inference
Advanced Features
enable_addons
bool, optional - Enable LoRA addons supportdraft_token_count
int, optional - Number of tokens to generate per step for speculative decodingdraft_model
str, optional - Model to use for speculative decodingngram_speculation_length
int, optional - Length of previous input sequence for N-gram speculationlong_prompt_optimized
bool, optional - Optimize for long promptstemperature
float, optional - Sampling temperature for generation
Monitoring & Metrics
enable_metrics
bool, optional - Enable metrics collection. Currently supports time to last token for non-streaming requests.
Additional Configuration
description
str, optional - Description of the deploymentcluster
str, optional - Cluster identifierenable_session_affinity
bool, optional - Enable session affinitydirect_route_api_keys
list[str], optional - List of API keys for direct routingdirect_route_type
str, optional - Type of direct routing
create_supervised_fine_tuning_job()
Creates a new supervised fine-tuning job and blocks until it is ready. See the SupervisedFineTuningJob section for details on the parameters.
Returns:
- An instance of
SupervisedFineTuningJob
.
delete_deployment()
Deletes the deployment associated with this LLM instance if one exists.
Arguments:
ignore_checks
bool, optional - Whether to ignore safety checks. Defaults to False.
get_time_to_last_token_mean()
Returns the mean time to last token for non-streaming requests. If no metrics are available, returns None.
Returns:
- A float representing the mean time to last token, or None if no metrics are available.
with_deployment_type()
Returns a new LLM instance with the specified deployment type.
Arguments:
deployment_type
str - The deployment type to use (“serverless”, “on-demand”, or “auto”)
Returns:
- A new
LLM
instance with the specified deployment type
with_temperature()
Returns a new LLM instance with the specified temperature.
Arguments:
temperature
float - The temperature for generation
Returns:
- A new
LLM
instance with the specified temperature
chat.completions.create()
and chat.completions.acreate()
Creates a chat completion using the LLM. These methods are OpenAI compatible and follow the same interface as described in the OpenAI Chat Completions API. Use create()
for synchronous calls and acreate()
for asynchronous calls.
Note: The Fireworks chat completions API includes additional request and response fields beyond the standard OpenAI API. See the Fireworks Chat Completions API reference for the complete set of available parameters and response fields.
Arguments:
messages
list - A list of messages comprising the conversation so farstream
bool, optional - Whether to stream the response. Defaults to Falseresponse_format
dict, optional - An object specifying the format that the model must outputreasoning_effort
str, optional - How much effort the model should put into reasoningmax_tokens
int, optional - The maximum number of tokens to generatetemperature
float, optional - Sampling temperature between 0 and 2. If not provided, uses the LLM’s default temperature. Note that temperature can also be set once during LLM instantiation if preferredtools
list, optional - A list of tools the model may callextra_headers
dict, optional - Additional headers to include in the request**kwargs
- Additional parameters supported by the OpenAI API
Returns:
ChatCompletion
whenstream=False
(default)Generator[ChatCompletionChunk, None, None]
whenstream=True
(sync version)AsyncGenerator[ChatCompletionChunk, None]
whenstream=True
(async version)
For details on the ChatCompletion
object structure, see the OpenAI Chat Completion Object documentation. For the ChatCompletionChunk
object structure used in streaming, see the OpenAI Chat Streaming documentation.
Dataset
The Dataset
class provides a convenient way to manage datasets for fine-tuning on Fireworks. It offers smart features like automatic naming and uploading of datasets. You do not instantiate a Dataset
object directly. Instead, you create a Dataset
object by using one of the class methods below.
Properties:
name
str - The name of the dataset
from_list()
Creates a Dataset from a list of training examples. Each example should be compatible with OpenAI’s chat completion format.
from_file()
Creates a Dataset from a local JSONL file. The file should contain training examples in OpenAI’s chat completion format.
from_string()
Creates a Dataset from a string containing JSONL-formatted training examples.
sync()
Uploads the dataset to Fireworks if it doesn’t already exist. This method automatically:
- Checks if a dataset with the same content hash already exists
- If it exists, skips the upload to avoid duplicates
- If it doesn’t exist, creates and uploads the dataset to Fireworks
- Validates the dataset after upload
delete()
Deletes the dataset from Fireworks.
Data Format
The Dataset class expects data in OpenAI’s chat completion format. Each training example should be a JSON object with a messages
array containing message objects. Each message object should have:
role
: One of"system"
,"user"
, or"assistant"
content
: The message content as a string
Example format:
SupervisedFineTuningJob
The SupervisedFineTuningJob
class manages fine-tuning jobs on Fireworks. It provides a convenient interface for creating, monitoring, and managing fine-tuning jobs.
Properties:
output_model
str - The identifier of the output model (e.g.,accounts/my-account/models/my-finetuned-model
)output_llm
LLM - An LLM instance associated with the output model
Instantiation
You do not need to directly instantiate a SupervisedFineTuningJob
object. Instead, you should use the .create_supervised_fine_tuning_job()
method on the LLM
object and pass in the following required and optional arguments.
Required Arguments
name
str - A unique name for the fine-tuning jobllm
LLM - The LLM instance to fine-tunedataset_or_id
Union[Dataset, str] - The dataset to use for fine-tuning, either as a Dataset object or dataset ID
Optional Arguments
Training Configuration
epochs
int, optional - Number of training epochslearning_rate
float, optional - Learning rate for traininglora_rank
int, optional - Rank for LoRA fine-tuningjinja_template
str, optional - Template for formatting training examplesearly_stop
bool, optional - Whether to enable early stoppingmax_context_length
int, optional - Maximum context length for the modelbase_model_weight_precision
str, optional - Precision for base model weightsbatch_size
int, optional - Batch size for training
Hardware Configuration
accelerator_type
str, optional - Type of GPU accelerator to useaccelerator_count
int, optional - Number of accelerators to useis_turbo
bool, optional - Whether to use turbo mode for faster trainingregion
str, optional - Region for deploymentnodes
int, optional - Number of nodes to use
Evaluation & Monitoring
evaluation_dataset
str, optional - Dataset ID to use for evaluationeval_auto_carveout
bool, optional - Whether to automatically carve out evaluation datawandb_config
WandbConfig, optional - Configuration for Weights & Biases integration
Job Management
id
str, optional - Job ID (auto-generated if not provided)api_key
str, optional - API key for authenticationstate
JobState, optional - Current state of the jobcreate_time
datetime, optional - Time when the job was createdupdate_time
datetime, optional - Time when the job was last updatedcreated_by
str, optional - User who created the joboutput_model
str, optional - ID of the output model
wait_for_completion()
Polls the job status until it is complete and returns the job object.
delete()
Deletes the job.