Tool Calling Example
This guide explains how to use the examples inexamples/tool_calling_example/
for evaluating and training models for tool/function calling capabilities. These examples primarily use Hydra for configuration.
Overview
Theexamples/tool_calling_example/
directory contains scripts for:
- Local Evaluation (
local_eval.py
): Evaluating a model’s ability to make tool calls against a dataset. - TRL GRPO Integration (
trl_grpo_integration.py
): Fine-tuning a model for tool calling using TRL (Transformer Reinforcement Learning) with Group Relative Policy Optimization (GRPO).
dataset.jsonl
is provided in the example directory. For tool calling tasks, each entry in the dataset typically includes:
messages
: A list of conversation messages.tools
: A list of tool definitions available to the model.ground_truth
: The expected assistant response, which might include tool calls (e.g.,{"role": "assistant", "tool_calls": [...]}
) or a direct content response.
Setup
- Environment: Ensure your Python environment has
reward-kit
and its development dependencies installed: - TRL Extras (for
trl_grpo_integration.py
): - API Keys: If using models that require API keys (e.g., Fireworks AI models for
local_eval.py
if not using a local model, or for downloading a base model for TRL), ensure necessary keys likeFIREWORKS_API_KEY
are set.
1. Local Evaluation (local_eval.py
)
This script performs local evaluation of a model’s tool calling.
Configuration
- Uses Hydra and is configured by
examples/tool_calling_example/conf/local_eval_config.yaml
. - The default configuration points to
examples/tool_calling_example/dataset.jsonl
. - The script itself likely contains defaults for the model and reward function, or expects them as CLI overrides.
How to Run
- Activate your virtual environment:
- Execute from the repository root:
Overriding Parameters
- Change dataset path:
- Other parameters (e.g., model name, reward function parameters) would typically be added to
local_eval_config.yaml
or passed as CLI overrides iflocal_eval.py
is structured to accept them via Hydra.
local_eval_config.yaml
as ./outputs/local_eval_tool_calling/${now:%Y-%m-%d}/${now:%H-%M-%S}
).
2. TRL GRPO Integration (trl_grpo_integration.py
)
This script provides a scaffold for fine-tuning a model for tool calling using TRL GRPO.
Note: The script defaults to using a MOCK model and tokenizer. Using a real model requires code modifications in trl_grpo_integration.py
and potentially conf/trl_grpo_config.yaml
.
Configuration
- Uses Hydra and is configured by
examples/tool_calling_example/conf/trl_grpo_config.yaml
. - Default
dataset_file_path
:dataset.jsonl
(assumed to be inexamples/tool_calling_example/
). - Default
model_name
:Qwen/Qwen2-0.5B-Instruct
. - Includes various
grpo
training parameters.
How to Run (with Mock Model by default)
- Activate your virtual environment:
- Execute from the repository root:
Overriding Parameters
- Change dataset path or training epochs:
Using a Real Model (Requires Code Changes)
- Modify
examples/tool_calling_example/trl_grpo_integration.py
to load your desired Hugging Face model and tokenizer (remove or conditionalize the mock model parts). - Ensure the prompt formatting in the script is suitable for your chosen model.
- Update
conf/trl_grpo_config.yaml
with the correctmodel_name
and adjust training parameters. - Run the script. If you added a flag like
use_mock_model_tokenizer
in the script/config, you might run:
trl_grpo_config.yaml
as ./outputs/trl_grpo_tool_calling/${now:%Y-%m-%d}/${now:%H-%M-%S}
).
For more general information on Hydra, see the Hydra Configuration for Examples guide.