Accuracy + Length Reward Examples
This directory contains examples demonstrating the use of combined accuracy and length-based reward functions.Overview
These examples show how to use thecosine_scaled_accuracy_length_reward
function to evaluate model responses based on both:
- Accuracy (correctness of the answer)
- Length efficiency (brevity of the response)
extract_fn
and compare_fn
parameters.
Examples
Cosine-Scaled Accuracy + Length Example
The cosine_scaled_example.py script demonstrates the reward function’s behavior with different types of responses:- Short correct answers (highest score)
- Long correct answers (moderate score)
- Short incorrect answers (very low score)
- Long incorrect answers (low score, but still penalized for being wrong)
Running the Examples
Expected Output
Custom Configurations
You can customize the reward function with various parameters:Use Cases
This reward function is particularly useful for:- Factual QA tasks where concise, correct answers are preferred
- Text summarization evaluation
- Mathematical problem-solving with step-by-step reasoning
- Any task where both accuracy and brevity are important