Contextual Precision
The contextual precision scorer is a default LLM judge scorer that measures whether contexts in your retrieval_context
are properly ranked by importance relative to the input
.
In practice, this scorer helps determine whether your RAG pipeline’s retriever is effectively ordering the retrieved contexts.
There are many factors to consider when evaluating the quality of your RAG pipeline. judgeval
offers a suite of default scorers to construct a comprehensive
evaluation of each RAG component. Check out our guide on RAG system evaluation for a deep dive!
Required Fields
To run the contextual precision scorer, you must include the following fields in your Example
:
input
actual_output
expected_output
retrieval_context
Scorer Breakdown
ContextualPrecision
scores are calculated by first determining which contexts in retrieval_context
are relevant to the input
based on the information in expected_output
.
Then, we compute the weighted cumulative precision (WCP) of the retrieved contexts. We use WCP because it:
- Emphasizes on top results: WCP places a strong emphasis on the relevance of top-ranked results. This emphasis is important because LLMs tend to give more attention to earlier nodes in the
retrieval_context
. Therefore, improper rankings can induce hallucinations in theactual_output
. - Rewards Effective Rankings: WCP captures the comparative relevance of different contexts (highly relevant vs. somewhat relevant). This is preferable to other approaches such as standard precision, which weights all retrieved contexts as equally relevant.
The score is calculated as:
Our contextual precision scorer is based on Stanford NLP’s ARES paper (Saad-Falcon et. al., 2024).
Sample Implementation
The ContextualPrecision
scorer uses an LLM judge, so you’ll receive a reason for the score in the reason
field of the results.
This allows you to double-check the accuracy of the evaluation and understand how the score was calculated.