The Llama 3.1 405B model uses the FP8 quantization format, which: Note: BF16 precision will be available soon for on-demand deployments.