Agent Evaluation Framework
The Agent Evaluation Framework allows you to evaluate agent models with tool-augmented reasoning using “Task Bundles” - self-contained directories that include all the necessary components for testing and evaluation.Task Bundle Structure
A task bundle is a self-contained directory with all the components needed to evaluate an agent:CLI Usage
The agent evaluation framework is integrated with the Reward Kit CLI through theagent-eval
command.