# Crawl ## Crawl websites. `client.browserRendering.crawl.create(CrawlCreateParamsparams, RequestOptionsoptions?): CrawlCreateResponse` **post** `/accounts/{account_id}/browser-rendering/crawl` Starts a crawl job for the provided URL and its children. Check available options like `gotoOptions` and `waitFor*` to control page load behaviour. ### Parameters - `CrawlCreateParams = Variant0 | Variant1` - `CrawlCreateParamsBase` - `Variant0 extends CrawlCreateParamsBase` - `Variant1 extends CrawlCreateParamsBase` ### Returns - `CrawlCreateResponse = string` Crawl job ID. ### Example ```node import Cloudflare from 'cloudflare'; const client = new Cloudflare({ apiToken: process.env['CLOUDFLARE_API_TOKEN'], // This is the default and can be omitted }); const crawl = await client.browserRendering.crawl.create({ account_id: 'account_id', url: 'https://example.com', }); console.log(crawl); ``` #### Response ```json { "result": "result", "success": true, "errors": [ { "code": 0, "message": "message" } ] } ``` ## Get crawl result. `client.browserRendering.crawl.get(stringjobId, CrawlGetParamsparams, RequestOptionsoptions?): CrawlGetResponse` **get** `/accounts/{account_id}/browser-rendering/crawl/{job_id}` Returns the result of a crawl job. ### Parameters - `jobId: string` Crawl job ID. - `params: CrawlGetParams` - `account_id: string` Path param: Account ID. - `cacheTTL?: number` Query param: Cache TTL default is 5s. Set to 0 to disable. - `cursor?: number` Query param: Cursor for pagination. - `limit?: number` Query param: Limit for pagination. - `status?: "queued" | "errored" | "completed" | 3 more` Query param: Filter by URL status. - `"queued"` - `"errored"` - `"completed"` - `"disallowed"` - `"skipped"` - `"cancelled"` ### Returns - `CrawlGetResponse` - `id: string` Crawl job ID. - `browserSecondsUsed: number` Total seconds spent in browser so far. - `finished: number` Total number of URLs that have been crawled so far. - `records: Array` List of crawl job records. - `metadata: Metadata` - `status: number` HTTP status code of the crawled page. - `url: string` Final URL of the crawled page. - `title?: string` Title of the crawled page. - `status: "queued" | "errored" | "completed" | 3 more` Current status of the crawled URL. - `"queued"` - `"errored"` - `"completed"` - `"disallowed"` - `"skipped"` - `"cancelled"` - `url: string` Crawled URL. - `html?: string` HTML content of the crawled URL. - `json?: Record` JSON of the content of the crawled URL. - `markdown?: string` Markdown of the content of the crawled URL. - `skipped: number` Total number of URLs that were skipped due to include/exclude/subdomain filters. Skipped URLs are included in records but are not counted toward total/finished. - `status: string` Current crawl job status. - `total: number` Total current number of URLs in the crawl job. - `cursor?: string` Cursor for pagination. ### Example ```node import Cloudflare from 'cloudflare'; const client = new Cloudflare({ apiToken: process.env['CLOUDFLARE_API_TOKEN'], // This is the default and can be omitted }); const crawl = await client.browserRendering.crawl.get('x', { account_id: 'account_id' }); console.log(crawl.id); ``` #### Response ```json { "result": { "id": "id", "browserSecondsUsed": 0, "finished": 0, "records": [ { "metadata": { "status": 0, "url": "url", "title": "title" }, "status": "queued", "url": "url", "html": "html", "json": { "foo": {} }, "markdown": "markdown" } ], "skipped": 0, "status": "status", "total": 0, "cursor": "cursor" }, "success": true, "errors": [ { "code": 0, "message": "message" } ] } ``` ## Cancel a crawl job. `client.browserRendering.crawl.delete(stringjobId, CrawlDeleteParamsparams, RequestOptionsoptions?): CrawlDeleteResponse` **delete** `/accounts/{account_id}/browser-rendering/crawl/{job_id}` Cancels an ongoing crawl job by setting its status to cancelled and stopping all queued URLs. ### Parameters - `jobId: string` The ID of the crawl job to cancel. - `params: CrawlDeleteParams` - `account_id: string` Account ID. ### Returns - `CrawlDeleteResponse` - `job_id: string` The ID of the cancelled job. - `message: string` Cancellation confirmation message. ### Example ```node import Cloudflare from 'cloudflare'; const client = new Cloudflare({ apiToken: process.env['CLOUDFLARE_API_TOKEN'], // This is the default and can be omitted }); const crawl = await client.browserRendering.crawl.delete('job_id', { account_id: 'account_id' }); console.log(crawl.job_id); ``` #### Response ```json { "result": { "job_id": "job_id", "message": "message" }, "success": true, "errors": [ { "code": 0, "message": "message" } ] } ``` ## Domain Types ### Crawl Create Response - `CrawlCreateResponse = string` Crawl job ID. ### Crawl Get Response - `CrawlGetResponse` - `id: string` Crawl job ID. - `browserSecondsUsed: number` Total seconds spent in browser so far. - `finished: number` Total number of URLs that have been crawled so far. - `records: Array` List of crawl job records. - `metadata: Metadata` - `status: number` HTTP status code of the crawled page. - `url: string` Final URL of the crawled page. - `title?: string` Title of the crawled page. - `status: "queued" | "errored" | "completed" | 3 more` Current status of the crawled URL. - `"queued"` - `"errored"` - `"completed"` - `"disallowed"` - `"skipped"` - `"cancelled"` - `url: string` Crawled URL. - `html?: string` HTML content of the crawled URL. - `json?: Record` JSON of the content of the crawled URL. - `markdown?: string` Markdown of the content of the crawled URL. - `skipped: number` Total number of URLs that were skipped due to include/exclude/subdomain filters. Skipped URLs are included in records but are not counted toward total/finished. - `status: string` Current crawl job status. - `total: number` Total current number of URLs in the crawl job. - `cursor?: string` Cursor for pagination. ### Crawl Delete Response - `CrawlDeleteResponse` - `job_id: string` The ID of the cancelled job. - `message: string` Cancellation confirmation message.