# Crawl

## Crawl websites.

`client.browserRendering.crawl.create(CrawlCreateParamsparams, RequestOptionsoptions?): CrawlCreateResponse`

**post** `/accounts/{account_id}/browser-rendering/crawl`

Starts a crawl job for the provided URL and its children. Check available options like `gotoOptions` and `waitFor*` to control page load behaviour.

### Parameters

- `CrawlCreateParams = Variant0 | Variant1`

  - `CrawlCreateParamsBase`

  - `Variant0 extends CrawlCreateParamsBase`

  - `Variant1 extends CrawlCreateParamsBase`

### Returns

- `CrawlCreateResponse = string`

  Crawl job ID.

### Example

```node
import Cloudflare from 'cloudflare';

const client = new Cloudflare({
  apiToken: process.env['CLOUDFLARE_API_TOKEN'], // This is the default and can be omitted
});

const crawl = await client.browserRendering.crawl.create({
  account_id: 'account_id',
  url: 'https://example.com',
});

console.log(crawl);
```

#### Response

```json
{
  "result": "result",
  "success": true,
  "errors": [
    {
      "code": 0,
      "message": "message"
    }
  ]
}
```

## Get crawl result.

`client.browserRendering.crawl.get(stringjobId, CrawlGetParamsparams, RequestOptionsoptions?): CrawlGetResponse`

**get** `/accounts/{account_id}/browser-rendering/crawl/{job_id}`

Returns the result of a crawl job.

### Parameters

- `jobId: string`

  Crawl job ID.

- `params: CrawlGetParams`

  - `account_id: string`

    Path param: Account ID.

  - `cacheTTL?: number`

    Query param: Cache TTL default is 5s. Set to 0 to disable.

  - `cursor?: number`

    Query param: Cursor for pagination.

  - `limit?: number`

    Query param: Limit for pagination.

  - `status?: "queued" | "errored" | "completed" | 3 more`

    Query param: Filter by URL status.

    - `"queued"`

    - `"errored"`

    - `"completed"`

    - `"disallowed"`

    - `"skipped"`

    - `"cancelled"`

### Returns

- `CrawlGetResponse`

  - `id: string`

    Crawl job ID.

  - `browserSecondsUsed: number`

    Total seconds spent in browser so far.

  - `finished: number`

    Total number of URLs that have been crawled so far.

  - `records: Array<Record>`

    List of crawl job records.

    - `metadata: Metadata`

      - `status: number`

        HTTP status code of the crawled page.

      - `url: string`

        Final URL of the crawled page.

      - `title?: string`

        Title of the crawled page.

    - `status: "queued" | "errored" | "completed" | 3 more`

      Current status of the crawled URL.

      - `"queued"`

      - `"errored"`

      - `"completed"`

      - `"disallowed"`

      - `"skipped"`

      - `"cancelled"`

    - `url: string`

      Crawled URL.

    - `html?: string`

      HTML content of the crawled URL.

    - `json?: Record<string, unknown>`

      JSON of the content of the crawled URL.

    - `markdown?: string`

      Markdown of the content of the crawled URL.

  - `skipped: number`

    Total number of URLs that were skipped due to include/exclude/subdomain filters. Skipped URLs are included in records but are not counted toward total/finished.

  - `status: string`

    Current crawl job status.

  - `total: number`

    Total current number of URLs in the crawl job.

  - `cursor?: string`

    Cursor for pagination.

### Example

```node
import Cloudflare from 'cloudflare';

const client = new Cloudflare({
  apiToken: process.env['CLOUDFLARE_API_TOKEN'], // This is the default and can be omitted
});

const crawl = await client.browserRendering.crawl.get('x', { account_id: 'account_id' });

console.log(crawl.id);
```

#### Response

```json
{
  "result": {
    "id": "id",
    "browserSecondsUsed": 0,
    "finished": 0,
    "records": [
      {
        "metadata": {
          "status": 0,
          "url": "url",
          "title": "title"
        },
        "status": "queued",
        "url": "url",
        "html": "html",
        "json": {
          "foo": {}
        },
        "markdown": "markdown"
      }
    ],
    "skipped": 0,
    "status": "status",
    "total": 0,
    "cursor": "cursor"
  },
  "success": true,
  "errors": [
    {
      "code": 0,
      "message": "message"
    }
  ]
}
```

## Cancel a crawl job.

`client.browserRendering.crawl.delete(stringjobId, CrawlDeleteParamsparams, RequestOptionsoptions?): CrawlDeleteResponse`

**delete** `/accounts/{account_id}/browser-rendering/crawl/{job_id}`

Cancels an ongoing crawl job by setting its status to cancelled and stopping all queued URLs.

### Parameters

- `jobId: string`

  The ID of the crawl job to cancel.

- `params: CrawlDeleteParams`

  - `account_id: string`

    Account ID.

### Returns

- `CrawlDeleteResponse`

  - `job_id: string`

    The ID of the cancelled job.

  - `message: string`

    Cancellation confirmation message.

### Example

```node
import Cloudflare from 'cloudflare';

const client = new Cloudflare({
  apiToken: process.env['CLOUDFLARE_API_TOKEN'], // This is the default and can be omitted
});

const crawl = await client.browserRendering.crawl.delete('job_id', { account_id: 'account_id' });

console.log(crawl.job_id);
```

#### Response

```json
{
  "result": {
    "job_id": "job_id",
    "message": "message"
  },
  "success": true,
  "errors": [
    {
      "code": 0,
      "message": "message"
    }
  ]
}
```

## Domain Types

### Crawl Create Response

- `CrawlCreateResponse = string`

  Crawl job ID.

### Crawl Get Response

- `CrawlGetResponse`

  - `id: string`

    Crawl job ID.

  - `browserSecondsUsed: number`

    Total seconds spent in browser so far.

  - `finished: number`

    Total number of URLs that have been crawled so far.

  - `records: Array<Record>`

    List of crawl job records.

    - `metadata: Metadata`

      - `status: number`

        HTTP status code of the crawled page.

      - `url: string`

        Final URL of the crawled page.

      - `title?: string`

        Title of the crawled page.

    - `status: "queued" | "errored" | "completed" | 3 more`

      Current status of the crawled URL.

      - `"queued"`

      - `"errored"`

      - `"completed"`

      - `"disallowed"`

      - `"skipped"`

      - `"cancelled"`

    - `url: string`

      Crawled URL.

    - `html?: string`

      HTML content of the crawled URL.

    - `json?: Record<string, unknown>`

      JSON of the content of the crawled URL.

    - `markdown?: string`

      Markdown of the content of the crawled URL.

  - `skipped: number`

    Total number of URLs that were skipped due to include/exclude/subdomain filters. Skipped URLs are included in records but are not counted toward total/finished.

  - `status: string`

    Current crawl job status.

  - `total: number`

    Total current number of URLs in the crawl job.

  - `cursor?: string`

    Cursor for pagination.

### Crawl Delete Response

- `CrawlDeleteResponse`

  - `job_id: string`

    The ID of the cancelled job.

  - `message: string`

    Cancellation confirmation message.