By default, models on dedicated deployments are served using 16-bit floating-point (FP16) precision. Quantization reduces the number of bits
used to serve the model, improving performance and reducing cost to serve. However, this can change model numerics
which may introduce small changes to the output.Take a look at our blog post for a detailed treatment of how
quantization affects model quality.