Dataset Storage (BYOB)
Point Fireworks to your own cloud storage for training datasets. This applies to both Supervised Fine-Tuning (SFT) and Reinforcement Fine-Tuning (RFT) jobs.GCS Bucket Integration
Use external Google Cloud Storage (GCS) buckets for fine-tuning while keeping your data private. Fireworks creates proxy datasets that reference your external buckets—data is only accessed during fine-tuning within a secure, isolated cluster.Your data never leaves your GCS bucket except during fine-tuning, ensuring maximum privacy and security.
Required Permissions
You need to grant access to three service accounts: Fireworks Control Plane- Account:
fireworks-control-plane@fw-ai-cp-prod.iam.gserviceaccount.com - Required role: Custom role with
storage.buckets.getIamPolicypermission
- Account:
inference@fw-ai-cp-prod.iam.gserviceaccount.com - Required role: Storage Object Viewer (
roles/storage.objectViewer)
- Account: Your company’s Fireworks account email (get it with
firectl account get) - Required role: Storage Object Viewer (
roles/storage.objectViewer)
Usage
AWS S3 Bucket Integration
Use external AWS S3 buckets for fine-tuning while keeping your data private. Fireworks accesses your S3 data using GCP-to-AWS OIDC federation—no long-lived credentials are stored.S3 bucket integration is currently supported for training datasets only (SFT and RFT jobs). Evaluation datasets are not yet supported.
IAM Role Setup
Create an IAM role with a trust policy that allows Fireworks to assume it via web identity federation:- Federated Principal:
accounts.google.com - Action:
sts:AssumeRoleWithWebIdentity - Condition:
accounts.google.com:audequals117388763667264115668
s3:GetObject and s3:ListBucket on your bucket.
See the AWS documentation for detailed steps on creating roles for OIDC federation.
Usage
For RFT jobs, use
firectl rftj create with the same --aws-iam-role flag.Alternative: Credentials Secret
Instead of IAM role federation, you can use static AWS access keys stored in a Fireworks secret:Secure Reinforcement Fine-Tuning (RFT)
Use reinforcement fine-tuning while keeping sensitive components and data under your control. Follow these steps to run secure RFT end to end using your own storage and reward pipeline.Configure storage (BYOB)
Set up your dataset storage using GCS or AWS S3 as described above.For models, you can optionally use External AWS S3 Bucket Integration.
Prepare your reward pipeline and rollouts
Keep your reward functions, rollout servers, and training metrics under your control. Generate rewards from your environment and write them to examples in your dataset (or export a dataset that contains per-example rewards).
- Reward functions and reward models remain proprietary and never need to be shared
- Rollouts and evaluation infrastructure run in your environment
- Model checkpoints can be registered to your storage registry if desired
Create a dataset that includes rewards
Create or point a
Dataset at your BYOB storage. Ensure each example contains the information required by your reward pipeline (for example, prompts, outputs/trajectories, and numeric rewards).You can reuse existing supervised data by attaching reward signals produced by your pipeline, or export a fresh dataset into your bucket for consumption by RFT.
Run reinforcement fine-tuning step from Python
Use the Python SDK to create a reinforcement fine-tuning step that reads from your BYOB dataset and produces a new checkpoint.See the Create Reinforcement Fine-tuning Step API reference for full parameters and options.
When continuing from a LoRA checkpoint, training parameters such as
lora_rank, learning_rate, max_context_length, and batch_size must match the original LoRA training.You now have an end-to-end secure RFT workflow with BYOB datasets, proprietary reward pipelines, and isolated training jobs that generate new checkpoints.