Skip to content

Ingesting BigQuery Data into Workers AI

Last reviewed: about 2 months ago

Introduction

You can connect a Cloudflare Worker to get data from Google BigQuery and pass it to Workers AI, to run AI Models, powered by serverless GPUs. This will allow you to enhance data with AI-generated responses, such as detecting the sentiment score of some text or generating tags for an article. This document describes a simple way to get started if you are looking to give Workers AI a try and see how the new and different AI models would perform with your data hosted in BigQuery.

User-based approach

This version of the integration is aimed at workflows that require interaction with users to fetch data or generate ad-hoc reports.

Figure 1: Ingesting Google BigQuery Data into Workers AI (user-based)
Figure 1: Ingesting Google BigQuery Data into Workers AI (user-based)
  1. A user makes a request to a Worker endpoint. (Which can optionally incorporate Access in front of it to authenticate users).
  2. Worker fetches securely stored Google Cloud Platform service account information such as service key and generates a JSON Web Token to issue an authenticated API request to BigQuery.
  3. Worker receives the data from BigQuery and transforms it into a format that will make it easier to iterate when interacting with Workers AI.
  4. Using its native integration with Workers AI, the Worker forwards the data from BigQuery which is then run against one of Cloudflare's hosted AI models.
  5. The original data retrieved from BigQuery alongside the AI-generated information is returned to the user as a response to the request initiated in step 1.

Cron-triggered approach

For periodic or longer workflows, you may opt for a batch approach. This diagram also explores more products where you can use the data ingested from BigQuery. It relies on Cron Triggers, which are built into the Developer Platform and available for free when using Workers to schedule initialization of workloads.

Figure 2: Ingesting Google BigQuery Data into Workers AI (cron-triggered)
Figure 2: Ingesting Google BigQuery Data into Workers AI (cron-triggered)
  1. A Cron Trigger invokes the Worker without any user interaction.
  2. Worker fetches securely stored Google Cloud Platform service account information such as service key and generates a JSON Web Token to issue an authenticated API request to BigQuery.
  3. Worker receives the data from BigQuery and transforms it into a format that will make it easier to iterate when interacting with Workers AI.
  4. Using its native integration with Workers AI, the Worker forwards the data from BigQuery to generate some content related to it.
  5. Optionally, you can store the BigQuery data and the AI-generated data in a variety of different Cloudflare services.
    • Into D1, a SQL database.
    • If in step four you used Workers AI to generate embeddings, you can store them in Vectorize. To learn more about this type of solution, please consider reviewing the reference architecture diagram on Retrieval Augmented Generation.
    • To Workers KV if the output of your data will be stored and consumed in a key/value fashion.
    • If you prefer to save the data fetched from BigQuery and Workers AI into objects (such as images, files, JSONs), you can use R2, our egress-free object storage to do so.
  6. You can set up an integration so a system or a user gets notified whenever a new result is available or if an error occurs. It's also worth mentioning that Workers by themselves can already provide additional observability.
    • Sending an email with all the data retrieved and generated in the previous step is possible using Email Routing.
    • Since Workers allows you to issue HTTP requests, you can notify a webhook or API endpoint once the process finishes or if there's an error.