Skip to content

Set up Evaluations

This guide walks you through the process of setting up an evaluation in AI Gateway. These steps are done in the Cloudflare dashboard.

1. Select or create a dataset

Datasets are collections of logs stored for analysis that can be used in an evaluation. You can create datasets by applying filters in the Logs tab. Datasets will update automatically based on the set filters.

Set up a dataset from the Logs tab

  1. Apply filters to narrow down your logs. Filter options include provider, number of tokens, request status, and more.
  2. Select Create Dataset to store the filtered logs for future analysis.

You can manage datasets by selecting Manage datasets from the Logs tab.

2. Select evaluators

After creating a dataset, choose the evaluation parameters:

  • Cost: Calculates the average cost of inference requests within the dataset (only for requests with cost data).
  • Speed: Calculates the average duration of inference requests within the dataset.
  • Performance:
    • Human feedback: measures performance based on human feedback, calculated by the % of thumbs up on the logs, annotated from the Logs tab.

3. Name, review, and run the evaluation

  1. Create a unique name for your evaluation to reference it in the dashboard.
  2. Review the selected dataset and evaluators.
  3. Select Run to start the process.

4. Review and analyze results

Evaluation results will appear in the Evaluations tab. The results show the status of the evaluation (for example, in progress, completed, or error). Metrics for the selected evaluators will be displayed, excluding any logs with missing fields. You will also see the number of logs used to calculate each metric.

While datasets automatically update based on filters, evaluations do not. You will have to create a new evaluation if you want to evaluate new logs.

Use these insights to optimize based on your application’s priorities. Based on the results, you may choose to: