Experiment Comparisons
Learn how to A/B test changes in your LLM workflows using experiment comparisons.
Introduction
Experiment comparisons allow you to systematically A/B test changes in your LLM workflows. Whether you’re testing different prompts, models, or architectures, Judgment helps you compare results across experiments to make data-driven decisions about your LLM systems.
Creating Your First Comparison
Let’s walk through how to create and run experiment comparisons:
After running the following code, click the View Results
link to take you to your experiment run on the Judgment Platform.
Analyzing Results
Once your experiments are complete, you can compare them on the Judgment Platform:
-
You’ll be automatically directed to your Experiment page. Here you’ll see your latest experiment results and a “Compare” button.
-
Click the “Compare” button to navigate to the Experiments page. Here you can select a previous experiment to compare against your current results.
-
After selecting an experiment, you’ll return to the Experiment page with both experiments’ results displayed side by side.
-
For detailed insights, click on any row in the comparison table to see specific metrics and analysis.
Use these detailed comparisons to make data-driven decisions about which model, prompt, or architecture performs best for your specific use case.
Next Steps
- To learn more about creating datasets to run on your experiments, check out our Datasets section