In cases where large parts of the LLM output are known in advance, e.g. editing or rewriting a document or code snippet, you can improve output generation speeds with predicted outputs. Predicted outputs allows you to provide strong “guesses” of what output may look like. To use Predicted Outputs, set theDocumentation Index
Fetch the complete documentation index at: https://fireworks.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
prediction field in the Fireworks API with the predicted output. For example, you may want to edit a survey and add an option to contact users by text message:
Python (Fireworks)
Additional information on Predicted Outputs:
- Using Predicted Outputs is free at this time
- We recommend setting
temperature=0for best results for most intended use cases of Predicted Outputs. In these cases, using Predicted Outputs does not impact the quality of outputs generated - If the prediction is substantially different from the generated output, output generation speed may decrease
- The max length of the
predictionfield is set bymax_tokensand is 2048 by default, and needs to be updated if you have a longer input and prediction. - If you are using an on-demand deployment, you can set
rewrite_speculation=Trueand potentially get even faster output generation. We are working on rolling this out to Serverless soon.