Synthetic Data Generation

Synthetic data generation allows you to create diverse test cases from a small set of examples while maintaining the statistical properties and patterns of your original data. This powerful capability offers several key advantages for agent evaluation:

Data Diversity: Generate a wider range of test cases from limited initial examples
Controlled Variation: Introduce specific types of variations to test agent robustness
Cost Efficiency: Reduce the need for manual data collection and annotation
Privacy: Create data that doesn’t contain sensitive information
Edge Cases: Generate challenging scenarios to test agent performance

Synthetic data is particularly valuable when you need to test your agents against rare but important scenarios that might be difficult to find in real-world data.

Generate Your First Synthetic Dataset

To generate synthetic data from your existing examples:

Navigate to your project in the Judgment Platform
Select the dataset you want to expand
Click the “Generate Data” button in the top right

In the configuration window, you can specify:
- Number of Examples: How many new examples to generate
- Data Diversity (0-1): Controls how different the generated examples will be from your seed data

Higher diversity scores (closer to 1) will generate more varied examples. For instance, if your original dataset contains customer service questions about shoes, a high diversity score might generate questions about other products, while a low score will stay focused on shoe-related queries.

Click “Generate” and our system will automatically create and add the new examples to your dataset.

Generated examples maintain the same structure as your original data. For example, if you’re generating Q&A pairs for RAG evaluation, the synthetic data will follow the same Q&A format.

Best Practices

When generating synthetic data for evaluation:

Start Small: Begin with a small number of examples to verify the generation quality
Validate Examples: Review generated examples to ensure they match your use case
Iterate on Diversity: Adjust the diversity score based on how varied you need your examples to be

You can use synthetic data generation in combination with unit testing to create comprehensive test suites for your agents.

Next Steps

Now that you’ve learned about synthetic data generation, explore how to:

Create experiments with your expanded datasets
Set up unit testing with synthetic examples

Welcome!

Evaluation (Experiments)

Monitoring

Integrations

Alerts

Clustering

Self-Hosting

Judgment CLI

Synthetic Data

Optimization

API Reference

Security & Compliance

Changelog

Synthetic Data Generation

Synthetic Data Generation

Generate Your First Synthetic Dataset

Best Practices

Next Steps

Welcome!

Evaluation (Experiments)

Monitoring

Integrations

Alerts

Clustering

Self-Hosting

Judgment CLI

Synthetic Data

Optimization

API Reference

Security & Compliance

Changelog

​Synthetic Data Generation

​Generate Your First Synthetic Dataset

​Best Practices

​Next Steps

Synthetic Data Generation

Generate Your First Synthetic Dataset

Best Practices

Next Steps