Code Execution with E2B
This guide demonstrates how to use the E2B code execution reward function to evaluate code by running it in the E2B cloud sandbox.Overview
Thee2b_code_execution_reward
function allows you to:
- Extract code blocks from LLM responses
- Execute the code securely in E2B’s cloud sandbox
- Compare the output with expected results
- Generate a score and detailed metrics
Prerequisites
To use the E2B code execution reward function, you need:- An E2B API key from E2B Dashboard
- The
e2b_code_interpreter
Python package installed:pip install e2b_code_interpreter
e2b
package, but e2b_code_interpreter
is recommended as it provides a more stable interface specifically designed for code execution.
Basic Usage
Here’s a simple example of how to use the reward function:Define expected output
expected_output = “120”Evaluate the code using E2B
result = e2b_code_execution_reward( messages=messages, expected_output=expected_output, language=“python”, api_key=“your_e2b_api_key”, timeout=10 )Use the results
print(f”Score: ”) for metric_name, metric in result.metrics.items(): print(f”\n: “)Fallback to Local Execution
You can gracefully fall back to local execution when an E2B API key is not available:Parameters
Thee2b_code_execution_reward
function accepts the following parameters:
Parameter | Type | Description |
---|---|---|
messages | List[Dict[str, str]] | Generated conversation messages (required) |
original_messages | List[Dict[str, str]] | Original conversation context (optional) |
expected_output | str | Expected output from code execution (optional) |
language | str | Programming language of the code (default: “python”) |
timeout | int | Maximum execution time in seconds (default: 30) |
api_key | str | E2B API key (default: None, uses E2B_API_KEY environment variable) |
Return Value
The reward function returns anEvaluateResult
object with:
score
: A float between 0.0 and 1.0 indicating how well the code performed.reason
: An overall explanation for the evaluation.metrics
: A dictionary ofMetricResult
objects with detailed information about the execution.error
(optional): A string describing any error during evaluation.
extracted_code
: The code that was extracted and executedexpected_output
: The expected output (if provided or extracted)execution_result
: Details about the execution (success or failure)output_match
: Comparison between actual and expected outputs
Examples
See theexamples/
directory for complete examples:
e2b_reward_example.py
: Basic Python examplee2b_javascript_example.py
: JavaScript examplee2b_auto_extract_example.py
: Automatic output extraction examplee2b_fallback_example.py
: Fallback to local execution example