Training and Sampling

What this is

This is the default lifecycle for research loops: bootstrap a trainer and deployment, run iterative updates, export checkpoints, sync weights to the deployment, then sample through it for realistic evaluation.

Workflow

Request resources: create a service-mode trainer (TrainerJobManager) first and capture its job_id/job_name, then create or attach a deployment (DeploymentManager) linked to that trainer’s weight-sync bucket.
Connect a training client from your Python loop.
Run train steps: forward_backward_custom + optim_step in a loop.
Save checkpoints at regular intervals using base/delta pattern.
Weight-sync the checkpoint to your serving deployment.
Sample and evaluate through the deployment endpoint.
Record metrics and decide whether to continue or branch experiments.

End-to-end example

The only training-shape input you choose below is the shape ID. The API resolves the versioned reference for you before launch.

1. Bootstrap

import os
import tinker
from concurrent.futures import ThreadPoolExecutor
from fireworks.training.sdk import (
    FiretitanServiceClient,
    TrainerJobManager,
    TrainerJobConfig,
    DeploymentManager,
    DeploymentConfig,
    WeightSyncer,
)

api_key = os.environ["FIREWORKS_API_KEY"]
base_url = os.environ.get("FIREWORKS_BASE_URL", "https://api.fireworks.ai")
shape_id = "accounts/fireworks/trainingShapes/qwen3-8b-128k-h200"

rlor_mgr = TrainerJobManager(api_key=api_key, base_url=base_url)
deploy_mgr = DeploymentManager(api_key=api_key, base_url=base_url)

# This is the only shape-specific value you choose
profile = rlor_mgr.resolve_training_profile(shape_id)

# Request the trainer first, then wait separately.
created = rlor_mgr.create(TrainerJobConfig(
    base_model="accounts/fireworks/models/qwen3-8b",
    training_shape_ref=profile.training_shape_version,
    lora_rank=0,
    learning_rate=1e-5,
    gradient_accumulation_steps=4,
))
print(f"Trainer requested: {created.job_id}")

# Create deployment linked to the trainer.
deploy_info = deploy_mgr.create_or_get(DeploymentConfig(
    deployment_id="research-serving",
    base_model="accounts/fireworks/models/qwen3-8b",
    hot_load_trainer_job=created.job_name,
    min_replica_count=0,
    max_replica_count=1,
))

# Wait for trainer and deployment readiness in parallel.
with ThreadPoolExecutor(max_workers=2) as pool:
    trainer_future = pool.submit(rlor_mgr.wait_for_ready, created.job_id)
    deploy_future = pool.submit(deploy_mgr.wait_for_ready, deploy_info.deployment_id)
    endpoint = trainer_future.result()
    deploy_info = deploy_future.result()

# Connect client (FiretitanServiceClient provides checkpoint_type + session ID)
service = FiretitanServiceClient(base_url=endpoint.base_url, api_key=api_key)
training_client = service.create_training_client(
    base_model="accounts/fireworks/models/qwen3-8b", lora_rank=0,
)

2. Train step with custom objective

def objective(data, logprobs_list):
    loss = compute_objective(data=data, logprobs_list=logprobs_list)
    return loss, {"loss": float(loss.item())}

for step in range(total_steps):
    batch = build_batch(step)
    training_client.forward_backward_custom(batch, objective).result()
    training_client.optim_step(
        tinker.AdamParams(learning_rate=1e-5, beta1=0.9, beta2=0.999, eps=1e-8, weight_decay=0.01)
    ).result()

3. Checkpoint, weight sync, and evaluate

import asyncio

from transformers import AutoTokenizer
from fireworks.training.sdk import DeploymentSampler, AdaptiveConcurrencyController

# Set up WeightSyncer for automatic delta-chain management
tracker = WeightSyncer(
    policy_client=training_client,
    deploy_mgr=deploy_mgr,
    deployment_id="research-serving",
    base_model="accounts/fireworks/models/qwen3-8b",
    hotload_timeout=600,
    first_checkpoint_type="base",
)

if step % eval_interval == 0:
    # WeightSyncer auto-selects base (first) or delta (subsequent)
    tracker.save_and_hotload(f"step_{step:05d}")

    # Sample via deployment for evaluation
    tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-8B", trust_remote_code=True)
    sampler = DeploymentSampler(
        inference_url=deploy_mgr.inference_url,
        model=f"accounts/{deploy_mgr.account_id}/deployments/research-serving",
        api_key=api_key,
        tokenizer=tokenizer,
        concurrency_controller=AdaptiveConcurrencyController(initial_window=16),
    )
    completions = asyncio.run(
        sampler.sample_with_tokens(messages=eval_prompts, n=1)
    )
    score = evaluate_responses(completions)
    print({"step": step, "eval_score": score})

Concurrency control

sample_with_tokens(n=K) fans out K concurrent requests. A concurrency controller prevents overloading the deployment:

AdaptiveConcurrencyController (recommended) — automatically adjusts the concurrency window based on the server’s prefill queue latency. Starts at initial_window and grows or shrinks between steps using AIMD.
FixedConcurrencyController — a static semaphore with a fixed maximum. Use when you already know the right concurrency for your deployment.

See DeploymentSampler — Concurrency Control for full details and configuration options.

Reconnecting to a running trainer

If your client disconnects (script crash, notebook restart, network interruption), the trainer job keeps running on the server. Reconnect without restarting:

# Reconnect to existing job (handles preemption, transitional states)
endpoint = rlor_mgr.reconnect_and_wait(job_id, timeout_s=300)

# Create a new client on the same trainer
service = FiretitanServiceClient(base_url=endpoint.base_url, api_key=api_key)
training_client = service.create_training_client(
    base_model="accounts/fireworks/models/qwen3-8b", lora_rank=0,
)

# Continue training — step_id and checkpoints are preserved
training_client.forward_backward_custom(batch, objective).result()
training_client.optim_step(adam_params).result()

Operational guidance

Service mode supports both full-parameter and LoRA tuning. Set lora_rank=0 for full-parameter or a positive integer (e.g. 16, 64) for LoRA, and match create_training_client(lora_rank=...) accordingly.
Use checkpoint_type="base" for the first checkpoint, then "delta" for subsequent ones to reduce save/transfer time. Note: on full-parameter training, only base checkpoints are promotable — see Checkpoint kinds.
DeploymentSampler.sample_with_tokens() is async — use await in async code or asyncio.run(...) from synchronous scripts.
Keep checkpoint intervals predictable so evaluation comparisons are stable.
Store the exact prompt set used for each evaluation sweep for reproducibility.

Common pitfalls

Sampling from trainer internals instead of deployment endpoints can skew results — always evaluate through the serving path.
Missing checkpoint-to-deployment traceability makes rollback risky — log checkpoint names alongside metrics.
Stale deployments: Always verify the weight-synced checkpoint identity matches what you expect before sampling.

Loss Functions — built-in and custom loss function patterns
Vision Inputs — fine-tune VLMs with image and text data
Saving and Loading — checkpoint types and weight sync details
DeploymentSampler reference — sampling API details
WeightSyncer reference — weight sync lifecycle

Get Started

Serverless

Deployments

Models & Inference

Fine Tuning

Fire Pass

Administration

Security & Compliance

Integrations

What this is

Workflow

End-to-end example

1. Bootstrap

2. Train step with custom objective

3. Checkpoint, weight sync, and evaluate

Concurrency control

Reconnecting to a running trainer

Operational guidance

Common pitfalls

Get Started

Serverless

Deployments

Models & Inference

Fine Tuning

Fire Pass

Administration

Security & Compliance

Integrations

Documentation Index

​What this is

​Workflow

​End-to-end example

​1. Bootstrap

​2. Train step with custom objective

​3. Checkpoint, weight sync, and evaluate

​Concurrency control

​Reconnecting to a running trainer

​Operational guidance

​Common pitfalls

​Related guides

What this is

Workflow

End-to-end example

1. Bootstrap

2. Train step with custom objective

3. Checkpoint, weight sync, and evaluate

Concurrency control

Reconnecting to a running trainer

Operational guidance

Common pitfalls

Related guides