Cloudflare Docs
Reference Architecture
Edit this page on GitHub
Set theme to dark (⇧+D)

Content-based asset creation

​​ Introduction

Combining text-generation models with text-to-image models can lead to powerful AI systems capable of generating visual content based on input prompts. This integration can be achieved through a collaborative framework where a text-generation model generates prompts for the text-to-image model based on input text. Here’s how the process can work:

  • Input Text Processing: The input text is provided to the system, which can be anything from a simple sentence to multiple paragraphs. This text serves as the basis for generating visual content.

  • Prompt Generation: The text-generation model generates prompts based on the input text. These prompts are specifically crafted to guide the text-to-image model in generating images that are contextually relevant to the input text. The prompts can include descriptions, keywords, or other cues to guide the image generation process.

  • Content Moderation: Text-classification models can be employed to ensure that the generated assets comply with content policies

  • Text-to-Image Model: A text-to-image model takes the prompts generated by the text-generation model as input and produces corresponding images. The text-to-image model learns to translate textual descriptions into visual representations, aiming to capture the essence and context conveyed by the input text.

Example uses of such compositions of AI models can be employed to generation visual assets for marketing, publishing, presentations, and more.

​​ Asset generation

Figure 1:Content-based asset generation
Figure 1: Content-based asset generation
  1. Client upload: Send POST request with content to API endpoint.
  2. Prompt generation: Generate prompt for later-stage text-to-image model by calling Workers AI text generation models with content as input.
  3. Safety check: Check for compliance with safety guidelines by calling Workers AI text generation models with the previously generated prompt as input.
  4. Image generation: Generate image by calling Workers AI text-to-image models previously generated prompt.