Contextual Relevancy

The contextual relevancy scorer is a default LLM judge scorer that measures how relevant the contexts in retrieval_context are for an input. In practice, this scorer helps determine whether your RAG pipeline’s retriever effectively retrieves relevant contexts for a query.

Required Fields

To run the contextual relevancy scorer, you must include the following fields in your Example:

input
actual_output
retrieval_context

Scorer Breakdown

ContextualRelevancy scores are calculated by first extracting all statements in retrieval_context and then classifying which ones are relevant to the input.

The score is then calculated as:

\text{Contextual Relevancy} = \frac{\text{Number of Relevant Statements}}{\text{Total Number of Statements}}

Our contextual relevancy scorer is based on Stanford NLP’s ARES paper (Saad-Falcon et. al., 2024).

Sample Implementation

contextual_relevancy.py
from judgeval import JudgmentClient
from judgeval.data import Example
from judgeval.scorers import ContextualRelevancyScorer

client = JudgmentClient()
example = Example(
    input="What's your return policy for a pair of socks?",
    # Replace this with your LLM system's output
    actual_output="We offer a 30-day return policy for all items, including socks!",
    # Replace this with the contexts retrieved by your RAG retriever
    retrieval_context=["Return policy, all items: 30-day limit for full refund, no questions asked."]
)
# supply your own threshold
scorer = ContextualRelevancyScorer(threshold=0.8)

results = client.run_evaluation(
    examples=[example],
    scorers=[scorer],
    model="gpt-4o",
)
print(results)

The ContextualRelevancy scorer uses an LLM judge, so you’ll receive a reason for the score in the reason field of the results. This allows you to double-check the accuracy of the evaluation and understand how the score was calculated.

Welcome!

Evaluation (Experiments)

Monitoring

Integrations

Alerts

API Reference

Contextual Relevancy

Required Fields

Scorer Breakdown

Sample Implementation

Welcome!

Evaluation (Experiments)

Monitoring

Integrations

Alerts

API Reference

​Required Fields

​Scorer Breakdown

​Sample Implementation

Required Fields

Scorer Breakdown

Sample Implementation