Enable Google BigQuery
Cloudflare Logpush supports pushing logs directly to Google BigQuery (using Legacy Streaming API) via the Cloudflare API.
Cloudflare uses Google Application Credentials provided in Logpush job destination_conf to gain write access to your table. The provided service account needs a write permission for the table.
To enable Logpush to BigQuery:
- Go to Google Cloud Console for your account.
- Go to IAM & Admin > Service Accounts, and create a new service account.
- Add BigQuery Data Editor role under Permissions. At minimum, it requires
bigquery.tables.updateDatapermission. - Add a key under Keys.
- Click Add key.
- Click Create new key.
- Select Key type JSON.
- Click Create.
- Save the Application Credentials JSON file. You will need to use this when setting up a new Logpush job.
- In BigQuery, create a dataset and table. Refer to instructions from BigQuery ↗. For example, using
schema.jsonandbqcommand:
gcloud auth activate-service-account --key-file=${KEY_FILE}
PROJECT_ID=<PROJECT_ID>DATASET_ID=<DATASET_ID>TABLE_ID=<TABLE_ID>
bq mk --table "${PROJECT_ID}:${DATASET_ID}.${TABLE_ID}" schema.jsonTo set up a BigQuery Logpush job:
- Create a job with the appropriate endpoint URL and authentication parameters.
- Enable the job to begin pushing logs.
Ensure Log Share permissions are enabled, before attempting to read or configure a Logpush job. For more information refer to the Roles section.
To create a job, make a POST request to the Logpush jobs endpoint with the following fields:
- name (optional) - Use your domain name as the job name.
- destination_conf - A log destination consisting of a reference to BigQuery table and credentials in the string format below.
- <PROJECT_ID>, <DATASET_ID>, <TABLE_ID>: Project ID, Dataset ID, and table ID of the designated BigQuery table.
- <ENCODED_VALUE>: The encoded value of Application Credentials JSON as
credentials, either base64-encoded withbase64:prefix, or URL-encoded withurl:prefix.
"bq://projects/<PROJECT_ID>/datasets/<DATASET_ID>/tables/<TABLE_ID>?credentials=<ENCODED_VALUE>"- dataset - The category of logs you want to receive. Refer to Datasets for the full list of supported datasets.
- output_options (optional) - To configure fields, sample rate, and timestamp format, refer to Log Output Options. For timestamp, Cloudflare recommends using
timestamps=rfc3339.- When including custom formatting options, such as
output_type, or any prefix / suffix / delimiter / template options, make sure to setstringify_objecttrue, too, otherwise fields withobjecttype may not be serialized in the format compatible to BigQuery Legacy Streaming API.
- When including custom formatting options, such as
Example request using cURL:
Required API token permissions
At least one of the following token permissions
is required:
Logs Write
curl "https://api.cloudflare.com/client/v4/zones/$ZONE_ID/logpush/jobs" \ --request POST \ --header "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \ --json '{ "name": "<DOMAIN_NAME>", "destination_conf": "bq://projects/<PROJECT_ID>/datasets/<DATASET_ID>/tables/<TABLE_ID>?credentials=<ENCODED_VALUE>", "output_options": { "field_names": [ "ClientIP", "ClientRequestHost", "ClientRequestMethod", "ClientRequestURI", "EdgeEndTimestamp", "EdgeResponseBytes", "EdgeResponseStatus", "EdgeStartTimestamp", "RayID" ], "timestamp_format": "rfc3339" }, "max_upload_bytes": 5000000, "max_upload_records": 50000, "dataset": "http_requests", "enabled": true }'Response:
{ "errors": [], "messages": [], "result": { "id": <JOB_ID>, "dataset": "http_requests", "kind": "", "max_upload_bytes": 5000000, "max_upload_records": 50000, "enabled": true, "name": "<DOMAIN_NAME>", "output_options": { "field_names": ["ClientIP", "ClientRequestHost", "ClientRequestMethod", "ClientRequestURI", "EdgeEndTimestamp", "EdgeResponseBytes", "EdgeResponseStatus" ,"EdgeStartTimestamp", "RayID"], "timestamp_format": "rfc3339" }, "destination_conf": "bq://projects/<PROJECT_ID>/datasets/<DATASET_ID>/tables/<TABLE_ID>?credentials=<ENCODED_VALUE>", "last_complete": null, "last_error": null, "error_message": null }, "success": true}This will make a test upload with an empty content to verify that Logpush can upload, and you may see a row with empty data.
Refer to Manage Logpush with cURL to update a job (including enabling and disabling).
Note the following default quota and limits, as described in the BigQuery documentation ↗.
The following limits apply to BigQuery streaming inserts:
- Maximum HTTP request size (uncompressed, may include headers): 10 MB
- Maximum row size: 10 MB
- Maximum rows per request size: 50,000 rows.
These are default quota / limit, and you should adjust the Logpush jobs to match the limit, and/or request Google to increase them when needed.
Cloudflare Logpush supports pushing logs to Google Cloud Storage.
BigQuery supports loading up to 1,500 jobs per table per day (including failures) with up to 10 million files in each load. That means you can load into BigQuery once per minute and include up to 10 million files in a load. For more information, refer to BigQuery's quotas for load jobs.
Logpush delivers batches of logs as soon as possible, which means you could receive more than one batch of files per minute. Ensure your BigQuery job is configured to ingest files on a given time interval, like every minute, as opposed to when files are received. Ingesting files into BigQuery as each Logpush file is received could exhaust your BigQuery quota quickly.
For a community-supported example of how to set up a schedule job load with BigQuery, refer to Cloudflare + Google Cloud | Integrations repository ↗. Note that this repository is provided on a best-effort basis and is not maintained routinely.