The ability to browse websites can be crucial when building workflows with AI. Here, we provide an example where we use Browser Rendering to visit
https://news.ycombinator.com/ and then, using a machine learning model available in Workers AI, extract the first post as JSON with a specified schema.
Prerequisites
Use the create-cloudflare CLI to generate a new Hello World Cloudflare Worker script:
Install @cloudflare/puppeteer, which allows you to control the Browser Rendering instance:
Install zod so we can define our output format and zod-to-json-schema so we can convert it into a JSON schema format:
Activate the nodejs compatibility flag and add your Browser Rendering binding to your new wrangler.toml configuration:
We use .dev.vars here since it's only for local development, otherwise you'd use Secrets.
Load the page using Browser Rendering
In the code below, we launch a browser using await puppeteer.launch(env.MY_BROWSER), extract the rendered text and close the browser.
Then, with the user prompt, the desired output schema and the rendered text, prepare a prompt to send to the LLM.
Replace the contents of src/index.ts with the following skeleton script:
Call an LLM
Having the webpage text, the user's goal and output schema, we can now use an LLM to transform it to JSON according to the user's request.
The example below uses @hf/thebloke/deepseek-coder-6.7b-instruct-awq but other models, or services like OpenAI, could be used with minimal changes:
If you want to use Browser Rendering with OpenAI instead you'd just need to change the aiUrl endpoint and requestBody (or check out the llm-scraper-worker ↗ package).
Conclusion
The full Worker script now looks as follows:
You can run this script to test it using Wrangler's --remote flag:
With your script now running, you can go to http://localhost:8787/ and should see something like the following:
For more complex websites or prompts, you might need a better model. Check out the latest models in Workers AI.
Was this helpful?
What did you like?
What went wrong?
Thank you for helping improve Cloudflare's documentation!