On-demand deployments come with automatic cost optimization features:
Default autoscaling: Automatically scales to 0 replicas when not in use
Pay for what you use: Charged only for GPU time when replicas are active
Flexible configuration: Customize autoscaling behavior to match your needs
Best practices for cost management:
Leverage default autoscaling: The system automatically scales down deployments when not in use
Customize carefully: While you can modify autoscaling behavior using our configuration options, note that preventing scale-to-zero will result in continuous GPU charges
Consider your use case: For intermittent or low-frequency usage, serverless deployments might be more cost-effective