Answer Correctness
The answer correctness scorer is a default LLM judge scorer that measures how correct/consistent the LLM system’s actual_output
is to the expected_output
.
In practice, this scorer helps determine whether your LLM application produces answers that are consistent with golden/ground truth answers.
Required Fields
To run the answer relevancy scorer, you must include the following fields in your Example
:
input
actual_output
expected_output
Scorer Breakdown
AnswerCorrectness
scores are calculated by extracting statements made in the expected_output
and classifying how many are consistent/correct with respect to the actual_output
.
The score is calculated as:
Sample Implementation
The AnswerCorrectness
scorer uses an LLM judge, so you’ll receive a reason for the score in the reason
field of the results.
This allows you to double-check the accuracy of the evaluation and understand how the score was calculated.