Overview

In most scenarios, you will have multiple Examples that you want to evaluate together.
In judgeval, an evaluation dataset (EvalDataset) is a collection of Examples that you can scale evaluations across.

Creating a Dataset

Creating an EvalDataset is as simple as supplying a list of Examples.

create_dataset.py
from judgeval.data import Example
from judgeval.data.datasets import EvalDataset

examples = [
    Example(input="...", actual_output="..."), 
    Example(input="...", actual_output="..."), 
    ...
]


dataset = EvalDataset(
    examples=examples
)

You can also add Examples to an existing EvalDataset using the add_example method.

add_to_dataset.py
...

dataset.add_example(Example(...))

Saving/Loading Datasets

judgeval supports saving and loading datasets in the following formats:

  • JSON
  • CSV

From Judgment

You easily can save/load an EvalDataset from Judgment’s cloud.

push_dataset.py
# Saving
...
from judgeval import JudgmentClient

client = JudgmentClient()
client.push_dataset(alias="my_dataset", dataset=dataset)
pull_dataset.py
# Loading
from judgeval import JudgmentClient

client = JudgmentClient()
dataset = client.pull_dataset(alias="my_dataset")

From JSON

You can save/load an EvalDataset with a JSON file. Your JSON file should have the following structure:

structure.json
{
    "examples": [
        {
            "input": "...", 
            "actual_output": "..."
        }, 
        ...
    ]
}

Here’s an example of how use judgeval to save/load from JSON.

json_dataset.py
from judgeval.data.datasets import EvalDataset

# saving
dataset = EvalDataset(...)  # filled with examples
dataset.save_as("json", "/path/to/save/dir", "save_name")

# loading
new_dataset = EvalDataset()
new_dataset.add_from_json("/path/to/your/json/file.json")

From CSV

You can save/load an EvalDataset with a .csv file. Your CSV should contain rows that can be mapped to Examples via column names. TODO: this section needs to be updated because the CSV format is not yet finalized.

Here’s an example of how use judgeval to save/load from CSV.

csv_dataset.py
from judgeval.data.datasets import EvalDataset

# saving
dataset = EvalDataset(...)  # filled with examples
dataset.save_as("csv", "/path/to/save/dir", "save_name")

# loading
new_dataset = EvalDataset()
new_dataset.add_from_csv("/path/to/your/csv/file.csv")

From YAML

You can save/load an EvalDataset with a .yaml file. Your YAML should contain rows that can be mapped to Examples via column names.

Here’s an example of how use judgeval to save/load from YAML.

yaml_dataset.py
from judgeval.data.datasets import EvalDataset

# saving
dataset = EvalDataset(...)  # filled with examples
dataset.save_as("yaml", "/path/to/save/dir", "save_name")

# loading
new_dataset = EvalDataset()
new_dataset.add_from_yaml("/path/to/your/yaml/file.yaml")

example.yaml
examples:
  - input: ...
    actual_output: ...
    expected_output: ...

Evaluate On Your Dataset

You can use the JudgmentClient to evaluate the Examples in your dataset using scorers.

evaluate_dataset.py
...

dataset = client.pull_dataset(alias="my_dataset")
res = client.evaluate_dataset(
    dataset=dataset,
    scorers=[FaithfulnessScorer(threshold=0.9)],
    model="gpt-4o",
)

Conclusion

Congratulations! 🎉

You’ve now learned how to create, save, and evaluate an EvalDataset in judgeval.

You can also view and manage your datasets via the Judgment platform.