Classifier Scorers
A ClassifierScorer
is a powerful tool for evaluating your LLM system using natural language criteria.
Classifier scorers are great for prototyping new evaluation criteria on a small set of examples before using them to benchmark your workflows at scale.
Creating a Classifier Scorer
judgeval
SDK
You can create a ClassifierScorer
by providing a natural language description of your evaluation task/criteria and a set of choices that an LLM judge can choose from when evaluating an example.
Specifically, you need to provide a conversation
that describes the task/criteria and a options
dictionary that maps each choice to a score.
You can also use Example
fields in your conversation
by using the mustache {{variable_name}}
syntax.
Here’s an example of creating a ClassifierScorer
that determines if a response is friendly or not:
Use variables from Example
s into your conversation
by using the mustache {{variable_name}}
syntax.
Using a Classifier Scorer
Classifer scorers can be used in the same way as any other scorer in judgeval
.
They can also be run in conjunction with other scorers in a single evaluation run!
Saving Classifier Scorers
Whether you create a ClassifierScorer
via the judgeval
SDK or the Judgment platform, you can save it to the Judgment
platform for reuse in future evaluations.
- If you create a
ClassifierScorer
via thejudgeval
SDK, you can save it by callingclient.push_classifier_scorer()
. - Similarly, you can load a
ClassifierScorer
by callingclient.fetch_classifier_scorer()
. - Each
ClassifierScorer
has a unique slug that you can use to identify it.
Real World Examples
You can find some real world examples of how our community has used ClassifierScorers
to evaluate their LLM systems in our cookbook repository!
Here are some of our favorites: