What is grammar-based structured output?
Grammar mode is the ability to specify a forced output schema for any Fireworks model via an extended BNF formal grammar (GBNF format). This method is popularly used to constrain model outputs in llama.cpp. What is a formal grammar? It’s a way to define rules to declare strings to be valid or invalid. See the “Syntax for Describing Grammars” below for more info. Similar to our JSON mode format, you provideresponse_format
field in the request like {"type": "grammar", "grammar": <your BNF grammar string> }
.
For best results, we still recommend that you do some prompt engineering and describe the desired output to the model to guide decision-making.
Why grammar-based structured output?
- Relying solely on system prompt engineering is finicky and time-consuming. It can be difficult to coerce the model to do certain things, for example
- Behave like a classifier, only output from a predefined list
- Output only Japanese, Chinese, a specified programming language, or otherwise prevent the model from generating a large set of of tokens
- Sometimes JSON is not what you need (e.g. it may be finicky with string escaping) and you need some other structured output
- Small models may have difficulty following instructions
End-to-end examples
This guide provides a step-by-step example of creating a structured output response with grammar using the Fireworks API. The example uses Python and the Fireworks Build SDK to define the schema for the output.Prerequisites
Before you begin, ensure you have the following:- Python installed on your system.
-
Build SDK installed. You can install it using pip:
llama-v3p1-405b-instruct
, but all fireworks models support this feature.
Step 1: Configure the Fireworks Build SDK
Step 2: Define the output grammar
Let’s say you have a classifier model that sorts patient requests into a few predefined classes. Then, you can ask the model to only respond within these classes.Step 3: Specify your output grammar in your chat completions request
Advanced examples
Japanese and Chinese
Given the below configurationC code generation
Programming languages like C can also be expressed as a grammar.Syntax
Background
Bakus-Naur Form (BNF) is a notation for describing the syntax of formal languages like programming languages, file formats, and protocols. Fireworks API uses an extension of BNF with a few modern regex-like features, inspired by Llama.cpp’s implementation.Basics
In BNF, we define production rules that specify how a non-terminal (rule name) can be replaced with sequences of terminals (characters, specifically Unicode code points) and other non-terminals. The basic format of a production rule isnonterminal ::= sequence...
.
Consider an example of a small chess notation grammar:
Non-terminals and terminals
Non-terminal symbols (rule names) stand for a pattern of terminals and other non-terminals. They are required to be a dashed lowercase word, likemove
, castle
, or check-mate
.
Terminals are actual characters (code points). They can be specified as a sequence like "1"
or "O-O"
or as ranges like [1-9]
or [NBKQR]
.
Characters and character ranges
Terminals support the full range of Unicode. Unicode characters can be specified directly in the grammar, for examplehiragana ::= [ぁ-ゟ]
, or with escapes: 8-bit (\xXX
), 16-bit (\uXXXX
) or 32-bit (\UXXXXXXXX
).
Character ranges can be negated with ^
:
.
symbol matches any character:
Sequences and alternatives
The order of symbols in a sequence matter. For example, in"1. " move " " move "\n"
, the "1. "
must come before the first move
, etc.
Alternatives, denoted by |
, give different sequences that are acceptable. For example, in move ::= pawn | nonpawn | castle
, move
can be a pawn
move, a nonpawn
move, or a castle
.
Parentheses ()
can be used to group sequences, which allows for embedding alternatives in a larger rule or applying repetition and optional symbols (below) to a sequence.
Repetition and optional symbols
*
after a symbol or sequence means that it can be repeated zero or more times.+
denotes that the symbol or sequence should appear one or more times.?
makes the preceding symbol or sequence optional.
Comments and newlines
Comments can be specified with#
:
|
will continue the current rule, even outside of parentheses.
The root rule
In a full grammar, theroot
rule always defines the starting point of the grammar. In other words, it specifies what the entire output must match.