Enable Google BigQuery
Cloudflare Logpush supports pushing logs directly to Google BigQuery (using Legacy Streaming API) via the Cloudflare dashboard or via API.
Cloudflare uses Google Application Credentials provided in Logpush job destination_conf to gain write access to your table. The provided service account needs a write permission for the table.
To enable Logpush to BigQuery:
- Go to Google Cloud Console for your account.
- Go to IAM & Admin > Service Accounts, and create a new service account.
- Add BigQuery Data Editor role under Permissions. At minimum, it requires
bigquery.tables.updateDatapermission. - Add a key under Keys.
- Click Add key.
- Click Create new key.
- Select Key type JSON.
- Click Create.
- Save the Application Credentials JSON file. You will need to use this when setting up a new Logpush job.
- In BigQuery, create a dataset and table. Refer to instructions from BigQuery ↗. For example, using
schema.jsonandbqcommand:
gcloud auth activate-service-account --key-file=${KEY_FILE}
PROJECT_ID=<PROJECT_ID>DATASET_ID=<DATASET_ID>TABLE_ID=<TABLE_ID>
bq mk --table "${PROJECT_ID}:${DATASET_ID}.${TABLE_ID}" schema.json-
In the Cloudflare dashboard, go to the Logpush page at the account or or domain (also known as zone) level.
For account: Go to Logpush
For domain (also known as zone): Go to Logpush
-
Depending on your choice, you have access to account-scoped datasets and zone-scoped datasets, respectively.
-
Select Create a Logpush job.
-
In Select a destination, choose Google BigQuery.
-
Enter the following destination details:
- Project ID - your Google Cloud project ID
- Dataset ID - the BigQuery dataset containing your table
- Table ID - the BigQuery table to push logs to
- Service Account Credentials - paste your Google service account key JSON. This credential is stored encrypted and will not be displayed again.
When you are done entering the destination details, select Continue.
-
Select the dataset to push to the storage service.
-
In the next step, you need to configure your logpush job:
- Enter the Job name.
- Under If logs match, you can select the events to include and/or remove from your logs. Refer to Filters for more information. Not all datasets have this option available.
- In Send the following fields, you can choose to either push all logs to your storage destination or selectively choose which logs you want to push.
-
In Advanced Options, you can:
- Choose the format of timestamp fields in your logs (
RFC3339(default),Unix, orUnixNano). - Select a sampling rate for your logs or push a randomly-sampled percentage of logs.
- Enable redaction for
CVE-2021-44228. This option will replace every occurrence of${withx{.
- Choose the format of timestamp fields in your logs (
-
Select Submit once you are done configuring your logpush job.
To set up a BigQuery Logpush job:
- Create a job with the appropriate endpoint URL and authentication parameters.
- Enable the job to begin pushing logs.
Ensure Log Share permissions are enabled, before attempting to read or configure a Logpush job. For more information refer to the Roles section.
To create a job, make a POST request to the Logpush jobs endpoint with the following fields:
- name (optional) - Use your domain name as the job name.
- destination_conf - A log destination consisting of a reference to BigQuery table and credentials in the string format below.
- <PROJECT_ID>, <DATASET_ID>, <TABLE_ID>: Project ID, Dataset ID, and table ID of the designated BigQuery table.
- <ENCODED_VALUE>: The encoded value of Application Credentials JSON as
credentials, either base64-encoded withbase64:prefix, or URL-encoded withurl:prefix.
"bq://projects/<PROJECT_ID>/datasets/<DATASET_ID>/tables/<TABLE_ID>?credentials=<ENCODED_VALUE>"- dataset - The category of logs you want to receive. Refer to Datasets for the full list of supported datasets.
- output_options (optional) - To configure fields, sample rate, and timestamp format, refer to Log Output Options. For timestamp, Cloudflare recommends using
timestamps=rfc3339.- When including custom formatting options, such as
output_type, or any prefix / suffix / delimiter / template options, make sure to setstringify_objecttrue, too, otherwise fields withobjecttype may not be serialized in the format compatible to BigQuery Legacy Streaming API.
- When including custom formatting options, such as
Example request using cURL:
Required API token permissions
At least one of the following token permissions
is required:
Logs Write
curl "https://api.cloudflare.com/client/v4/zones/$ZONE_ID/logpush/jobs" \ --request POST \ --header "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \ --json '{ "name": "<DOMAIN_NAME>", "destination_conf": "bq://projects/<PROJECT_ID>/datasets/<DATASET_ID>/tables/<TABLE_ID>?credentials=<ENCODED_VALUE>", "output_options": { "field_names": [ "ClientIP", "ClientRequestHost", "ClientRequestMethod", "ClientRequestURI", "EdgeEndTimestamp", "EdgeResponseBytes", "EdgeResponseStatus", "EdgeStartTimestamp", "RayID" ], "timestamp_format": "rfc3339" }, "max_upload_bytes": 5000000, "max_upload_records": 50000, "dataset": "http_requests", "enabled": true }'Response:
{ "errors": [], "messages": [], "result": { "id": <JOB_ID>, "dataset": "http_requests", "kind": "", "max_upload_bytes": 5000000, "max_upload_records": 50000, "enabled": true, "name": "<DOMAIN_NAME>", "output_options": { "field_names": ["ClientIP", "ClientRequestHost", "ClientRequestMethod", "ClientRequestURI", "EdgeEndTimestamp", "EdgeResponseBytes", "EdgeResponseStatus" ,"EdgeStartTimestamp", "RayID"], "timestamp_format": "rfc3339" }, "destination_conf": "bq://projects/<PROJECT_ID>/datasets/<DATASET_ID>/tables/<TABLE_ID>?credentials=<ENCODED_VALUE>", "last_complete": null, "last_error": null, "error_message": null }, "success": true}This will make a test upload with an empty content to verify that Logpush can upload, and you may see a row with empty data.
Refer to Manage Logpush with cURL to update a job (including enabling and disabling).
Note the following default quota and limits, as described in the BigQuery documentation ↗.
The following limits apply to BigQuery streaming inserts:
- Maximum HTTP request size (uncompressed, may include headers): 10 MB
- Maximum row size: 10 MB
- Maximum rows per request size: 50,000 rows.
These are default quota / limit, and you should adjust the Logpush jobs to match the limit, and/or request Google to increase them when needed.
Cloudflare Logpush supports pushing logs to Google Cloud Storage.
BigQuery supports loading up to 1,500 jobs per table per day (including failures) with up to 10 million files in each load. That means you can load into BigQuery once per minute and include up to 10 million files in a load. For more information, refer to BigQuery's quotas for load jobs.
Logpush delivers batches of logs as soon as possible, which means you could receive more than one batch of files per minute. Ensure your BigQuery job is configured to ingest files on a given time interval, like every minute, as opposed to when files are received. Ingesting files into BigQuery as each Logpush file is received could exhaust your BigQuery quota quickly.
For a community-supported example of how to set up a schedule job load with BigQuery, refer to Cloudflare + Google Cloud | Integrations repository ↗. Note that this repository is provided on a best-effort basis and is not maintained routinely.