Skip to content
Analytics
Visit Analytics on GitHub
Set theme to dark (⇧+D)

Export GraphQL data to CSV

This tutorial shows how to create a Python script that queries the GraphQL API for Network Analytics data and then converts that data to comma-separated values (CSV) so that tools like Splunk can easily ingest and visualize it.

This example queries the ipFlows1mAttacksGroups data set, which aggregates Network Analytics attack activity into 1-minute intervals.

Set up script and authentication

The first step is to set up the script and define the variables for authenticating to the Analytics GraphQL API using a Cloudflare API token. The script also provides variables to set the range of data to export. This example queries for a seven-day period that ended yesterday.

Set up script and authenticationimport pandas as pdfrom datetime import datetime, timedeltaimport requests url = 'https://api.cloudflare.com/client/v4/graphql/'# Customize these variables.file_dir = ''  # Must include trailing slash. If left blank, # csv will be created in the current directory.api_email = '[your email here]'api_token = '[your API token here]'api_account = '[your account ID here]'# Set most recent day as yesterday by default.offset_days = 1# How many days worth of data do we want? By default, 7.historical_days = 7

Calculate the date n days ago

The get_date() function takes a number of days (num_days), subtracts that value from today's date, and returns the date num_days ago.

Calculate the date num_days agodef get_date(num_days):    today = datetime.utcnow().date()    return today - timedelta(days=num_days)

The script uses get_date() with the offset_days and historical_days variables to calculate the appropriate date range (min_date and max_date) when it queries the Analytics GraphQL API.

Query the Analytics GraphQL API

The get_cf_graphql() function assembles a request to the Analytics GraphQL API. The headers include the data for authentication.

The payload contains the GraphQL query. For help getting started with GraphQL queries, see Querying basics.

Note that the braces used in the GraphQL query are doubled to escape them in Python.

Query the Network Analytics GraphQL APIdef get_cf_graphql():    headers = {'content-type': 'application/json', 'X-Auth-Email': api_email, 'Authorization: Bearer': api_token}    # This variable replacement requires Python3.6 or higher    payload = f'''{{"query":        "query ipFlowEventLog(          $accountTag: string,           $filter: AccountIpFlows1mAttacksGroupsFilter_InputObject) {{          viewer {{            accounts(              filter: {{ accountTag: $accountTag }}            ) {{              ipFlows1mAttacksGroups(                filter: $filter,                 limit: 10000,                 orderBy: [min_datetimeMinute_ASC]              ) {{                dimensions {{                  attackId,                   attackDestinationIP,                   attackDestinationPort,                   attackMitigationType,                   attackSourcePort,                   attackType                }},                 avg {{                  bitsPerSecond,                   packetsPerSecond                }},                 min {{                  datetimeMinute,                   bitsPerSecond,                   packetsPerSecond                }},                 max {{                  datetimeMinute,                   bitsPerSecond,                   packetsPerSecond                }},                 sum {{                  bits,                   packets}}                }}              }}            }}          }}",          "variables": {{            "accountTag":"{api_account}",            "filter": {{              "AND":[{{"date_geq":"{min_date}"}},              {{"date_leq": "{max_date}"}}]            }}          }}        }}'''
    r = requests.post(url, data=payload, headers=headers)    return r.text

Convert the data to CSV

Use a tool such as the open-source pandas library (pd) to convert the GraphQL data to CSV. In this example, the convert_to_csv() function does a bit of JSON processing before conversion—normalizing the data, selecting only the desired data, and renaming the columns so that they are user friendly.

The result is output to file in the directory specified by file_dir.

Convert the data to CSVdef convert_to_csv():    # Parse JSON response in Pandas    network_analytics = pd.read_json(raw_data)['data']['viewer']['accounts']    # Flatten nested JSON data first    network_analytics_normalized = pd.json_normalize(network_analytics, 'ipFlows1mAttacksGroups')    # Only select the columns we're interested in    network_analytics_abridged = network_analytics_normalized[['dimensions.attackId','min.datetimeMinute','max.datetimeMinute','dimensions.attackMitigationType', 'dimensions.attackType','dimensions.attackDestinationIP','max.packetsPerSecond']] # Selecting only the data we want    # Rename the columns to visually friendly names    network_analytics_abridged.columns = ['Attack ID','Start date/time', 'End date/time', 'Action taken', 'Attack type', 'Destination IP', 'Max packets/second'] #Renaming columns    network_analytics_abridged.to_csv("{}network-analytics-{}.csv".format(file_dir,min_date)) max_date = get_date(offset_days)min_date = get_date(historical_days) raw_data = get_cf_graphql()convert_to_csv()

Complete script

Cloudflare Analytics GraphQL to CSVimport pandas as pdfrom datetime import datetime, timedeltaimport requests url = 'https://api.cloudflare.com/client/v4/graphql/'# Customize these variables.file_dir = ''  # Must include trailing slash. If left blank, # csv will be created in the current directory.api_email = '[your email here]'api_token = '[your API token here]'api_account = '[your account ID here]'# Set most recent day as yesterday by default.offset_days = 1# How many days worth of data do we want? By default, 7.historical_days = 7 def get_date(num_days):    today = datetime.utcnow().date()    return today - timedelta(days=num_days) def get_cf_graphql():    headers = {'content-type': 'application/json', 'X-Auth-Email': api_email, 'Authorization: Bearer': api_token}    # This variable replacement requires Python3.6 or higher    payload = f'''{{"query":        "query ipFlowEventLog(          $accountTag: string,           $filter: AccountIpFlows1mAttacksGroupsFilter_InputObject) {{          viewer {{            accounts(              filter: {{ accountTag: $accountTag }}            ) {{              ipFlows1mAttacksGroups(                filter: $filter,                 limit: 10000,                 orderBy: [min_datetimeMinute_ASC]              ) {{                dimensions {{                  attackId,                   attackDestinationIP,                   attackDestinationPort,                   attackMitigationType,                   attackSourcePort,                   attackType                }},                 avg {{                  bitsPerSecond,                   packetsPerSecond                }},                 min {{                  datetimeMinute,                   bitsPerSecond,                   packetsPerSecond                }},                 max {{                  datetimeMinute,                   bitsPerSecond,                   packetsPerSecond                }},                 sum {{                  bits,                   packets}}                }}              }}            }}          }}",          "variables": {{            "accountTag":"{api_account}",            "filter": {{              "AND":[{{"date_geq":"{min_date}"}},              {{"date_leq": "{max_date}"}}]            }}          }}        }}'''
    r = requests.post(url, data=payload, headers=headers)    return r.text def convert_to_csv():    # Parse JSON response in Pandas    network_analytics = pd.read_json(raw_data)['data']['viewer']['accounts']    # Flatten nested JSON data first    network_analytics_normalized = pd.json_normalize(network_analytics, 'ipFlows1mAttacksGroups')    # Only select the columns we're interested in    network_analytics_abridged = network_analytics_normalized[['dimensions.attackId','min.datetimeMinute','max.datetimeMinute','dimensions.attackMitigationType', 'dimensions.attackType','dimensions.attackDestinationIP','max.packetsPerSecond']] # Selecting only the data we want    # Rename the columns to visually friendly names    network_analytics_abridged.columns = ['Attack ID','Start date/time', 'End date/time', 'Action taken', 'Attack type', 'Destination IP', 'Max packets/second'] #Renaming columns    network_analytics_abridged.to_csv("{}network-analytics-{}.csv".format(file_dir,min_date)) max_date = get_date(offset_days)min_date = get_date(historical_days) raw_data = get_cf_graphql()convert_to_csv()