Unsafe and custom topic detection
AI Security for Apps (formerly Firewall for AI) can detect when an LLM prompt touches on unsafe or unwanted subjects. There are two layers of topic detection:
- Default unsafe topics — A built-in set of safety categories that detect harmful content such as violent crimes, hate speech, and sexual content.
- Custom topics — Topics you define to match your organization's specific policies, such as "competitors" or "financial advice".
When AI Security for Apps is enabled, it automatically evaluates prompts against a set of default unsafe topic categories and populates two fields:
- LLM Unsafe topic detected (
cf.llm.prompt.unsafe_topic_detected) —trueif any unsafe topic was found. - LLM Unsafe topic categories (
cf.llm.prompt.unsafe_topic_categories) — An array of the specific categories detected.
Default unsafe topic categories
| Category | Description |
|---|---|
S1 | Violent crimes |
S2 | Non-violent crimes |
S3 | Sex-related crimes |
S4 | Child sexual exploitation |
S5 | Defamation |
S6 | Specialized advice |
S7 | Privacy |
S8 | Intellectual property |
S9 | Indiscriminate weapons |
S10 | Hate |
S11 | Suicide and self-harm |
S12 | Sexual content |
S13 | Elections |
S14 | Code interpreter abuse |
-
When incoming requests match:
Field Operator Value LLM Unsafe topic detected equals True Expression when using the editor:
(cf.llm.prompt.unsafe_topic_detected) -
Action: Block
-
When incoming requests match:
Field Operator Value LLM Unsafe topic categories is in S1: Violent CrimesS10: HateExpression when using the editor:
(any(cf.llm.prompt.unsafe_topic_categories[*] in {"S1" "S10"})) -
Action: Block
Custom topic detection lets you define your own topics and AI Security for Apps will score each prompt against them. You can then use these scores in custom rules or rate limiting rules to block, challenge, or log matching requests.
This capability uses a zero-shot classification model that evaluates prompts at runtime. No model training is required.
- You define a list of up to 20 custom topics via the dashboard or API. Each topic consists of:
- A label — Used in rule expressions and analytics
- A topic string — The descriptive text the model uses to classify prompts
- When a request arrives at a
cf-llmlabeled endpoint, the model evaluates the prompt against all defined topic strings and returns a relevance score for each. - Scores are written to the
cf.llm.prompt.custom_topic_categoriesmap field, keyed by label. You use labels — not topic strings — in rule expressions and analytics.
Scores follow the same convention as other AI Security for Apps scores, where lower values indicate higher relevance (1 = highly relevant, 99 = not relevant).
-
In the Cloudflare dashboard, go to the Security Settings page.
Go to Settings -
Under AI Security for Apps, find the Custom Topics section and select Manage topics.
-
Add a topic by providing:
- Label: A short identifier used in rule expressions (for example,
competitors). - Topic: A descriptive English text string the model uses for classification (for example,
asking about Acme Corp products and pricing).
- Label: A short identifier used in rule expressions (for example,
-
Select Save.
Update your custom topics list using a PUT request:
curl "https://api.cloudflare.com/client/v4/zones/$ZONE_ID/firewall-for-ai/custom_topics" \--request PUT \--header "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \--json '{ "topics": [ { "label": "competitors", "topic": "competitor products and services" }, { "label": "finance", "topic": "financial advice and investment recommendations" }, { "label": "hr-internal", "topic": "internal HR policies and employee matters" } ]}'To retrieve your current topics use a GET request:
curl "https://api.cloudflare.com/client/v4/zones/$ZONE_ID/firewall-for-ai/custom_topics" \--header "Authorization: Bearer $CLOUDFLARE_API_TOKEN"| Parameter | Limit |
|---|---|
| Maximum number of topics | 20 |
| Topic string length | 2–50 printable ASCII characters |
| Label length | 2–20 characters |
| Label format | Lowercase letters, numbers, and hyphens (-) only |
-
When incoming requests match:
Enter the following expression in the editor:
(cf.llm.prompt.custom_topic_categories["competitors"] lt 30) -
Action: Block
-
When incoming requests match:
Enter the following expression in the editor:
(cf.llm.prompt.custom_topic_categories["finance"] lt 40) -
Action: Log
Example expression:
(cf.llm.prompt.custom_topic_categories["competitors"] lt 30 or cf.llm.prompt.pii_detected)
The quality of custom topic detection depends on how you write your topic strings. The underlying model is a zero-shot classifier — it compares the semantic meaning of the prompt against your topic string.
Overly broad topics match too many prompts (high false positives). Overly narrow topics miss relevant prompts (high false negatives).
| Quality | Topic string | Why |
|---|---|---|
| Good | Acme Corp products and pricing | Names a specific competitor — catches prompts discussing that company's offerings. |
| Good | securities trading and investment recommendations | Targets a well-defined intersection of two concepts. |
| Too narrow | Acme Corp pricing page URL | So specific that only near-exact mentions will score highly. |
| Too broad | technology | Will match almost any technical prompt. |
| Too broad | bad things | Semantically vague — the model cannot determine what you consider bad. |
A topic string like finance is less effective than securities trading and investment recommendations. More descriptive phrases give the model better signal and help prevent false positives.
If you define topics that mean nearly the same thing — for example, financial advice and investment guidance — both will score similarly on the same prompts, consuming two of your 20-topic budget without adding detection value. Consolidate overlapping concepts into a single topic.
The model performs semantic classification, not keyword matching. A topic string of Acme Corp products and pricing will detect requests that discuss that competitor's offerings even if they do not mention the company by name — for example, a prompt like "How does your pricing compare to the leading alternative?" can still score highly.
This also means you should phrase topics as action-oriented verb phrases that capture what the user is doing, not just the subject they mention. Descriptions that capture intent are significantly more discriminating — especially on borderline or ambiguous text.
For example, compare these two topic strings against two very different prompts:
| Topic string | "I read an article about tax deductions" | "What stocks should I buy to retire in 10 years?" |
|---|---|---|
financial advice | Medium relevance (false positive) | High relevance |
asking for financial advice | No relevance (correct) | High relevance |
The noun-phrase version (financial advice) returns a false positive on the passive text because the prompt merely mentions the subject. The verb-phrase version (asking for financial advice) correctly ignores passive mentions and only matches when the user is actively seeking advice.
Recommended phrasing styles:
| Style | Example |
|---|---|
| Noun phrase | investment advice |
| Verb phrase (recommended) | asking for investment advice |
| Sentence-like | a user seeking financial guidance |
For most use cases, a 3–6 word verb phrase is the best trade-off between precision and coverage.
After defining your topics, send test prompts and review the scores in Security Analytics. There are two ways to tune detection behavior:
- Adjust the topic string. If a topic is matching too broadly, make the topic string more specific. If it is not matching requests you expect it to catch, broaden or rephrase the topic string.
- Adjust the score threshold in your rule. A lower threshold (for example,
lt 20) is stricter and only matches highly relevant requests. A higher threshold (for example,lt 50) is more permissive and catches a wider range of related requests. Start with a moderate threshold and refine based on what you observe in logs.
| Label | Topic string | Use case |
|---|---|---|
competitors | asking about Acme Corp products and pricing | Prevent your chatbot from discussing a specific rival's offerings |
legal-advice | asking for legal counsel or regulatory compliance guidance | Block prompts that solicit legal advice from your AI |
student-data | requesting student personal information or academic records | EdTech — prevent discussion of individual student data |
exec-internal | discussing internal executive decisions or leadership changes | Prevent discussion of sensitive internal matters |
crypto-advice | asking for cryptocurrency trading or investment recommendations | FinTech — block prompts seeking crypto investment tips |