Set up Evaluations

This guide walks you through the process of setting up an evaluation in AI Gateway. These steps are done in the Cloudflare dashboard ↗.

1. Select or create a dataset

Datasets are collections of logs stored for analysis that can be used in an evaluation. You can create datasets by applying filters in the Logs tab. Datasets will update automatically based on the set filters.

Set up a dataset from the Logs tab

Apply filters to narrow down your logs. Filter options include provider, number of tokens, request status, and more.
Select Create Dataset to store the filtered logs for future analysis.

You can manage datasets by selecting Manage datasets from the Logs tab.

List of available filters

Filter category	Filter options	Filter by description
Status	error, status	error type or status.
Cache	cached, not cached	based on whether they were cached or not.
Provider	specific providers	the selected AI provider.
AI Models	specific models	the selected AI model.
Cost	less than, greater than	cost, specifying a threshold.
Request type	Universal, Workers AI Binding, WebSockets	the type of request.
Tokens	Total tokens, Tokens In, Tokens Out	token count (less than or greater than).
Duration	less than, greater than	request duration.
Feedback	equals, does not equal (thumbs up, thumbs down, no feedback)	feedback type.
Metadata Key	equals, does not equal	specific metadata keys.
Metadata Value	equals, does not equal	specific metadata values.
Log ID	equals, does not equal	a specific Log ID.
Event ID	equals, does not equal	a specific Event ID.

2. Select evaluators

After creating a dataset, choose the evaluation parameters:

Cost: Calculates the average cost of inference requests within the dataset (only for requests with cost data).
Speed: Calculates the average duration of inference requests within the dataset.
Performance:
- Human feedback: measures performance based on human feedback, calculated by the % of thumbs up on the logs, annotated from the Logs tab.

3. Name, review, and run the evaluation

Create a unique name for your evaluation to reference it in the dashboard.
Review the selected dataset and evaluators.
Select Run to start the process.

4. Review and analyze results

Evaluation results will appear in the Evaluations tab. The results show the status of the evaluation (for example, in progress, completed, or error). Metrics for the selected evaluators will be displayed, excluding any logs with missing fields. You will also see the number of logs used to calculate each metric.

While datasets automatically update based on filters, evaluations do not. You will have to create a new evaluation if you want to evaluate new logs.

Use these insights to optimize based on your application's priorities. Based on the results, you may choose to:

Change the model or provider
Adjust your prompts
Explore further optimizations, such as setting up Retrieval Augmented Generation (RAG)

Was this helpful?

Community
X
Discord
YouTube
GitHub