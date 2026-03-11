Unsafe and custom topic detection
AI Security for Apps (formerly Firewall for AI) can detect when an LLM prompt touches on unsafe or unwanted subjects. There are two layers of topic detection:
- Default unsafe topics — A built-in set of safety categories that detect harmful content such as violent crimes, hate speech, and sexual content.
- Custom topics — Topics you define to match your organization's specific policies, such as "competitors" or "financial advice".
When AI Security for Apps is enabled, it automatically evaluates prompts against a set of default unsafe topic categories and populates two fields:
- LLM Unsafe topic detected (
cf.llm.prompt.unsafe_topic_detected) —
trueif any unsafe topic was found.
- LLM Unsafe topic categories (
cf.llm.prompt.unsafe_topic_categories) — An array of the specific categories detected.
Default unsafe topic categories
|Category
|Description
S1
|Violent crimes
S2
|Non-violent crimes
S3
|Sex-related crimes
S4
|Child sexual exploitation
S5
|Defamation
S6
|Specialized advice
S7
|Privacy
S8
|Intellectual property
S9
|Indiscriminate weapons
S10
|Hate
S11
|Suicide and self-harm
S12
|Sexual content
S13
|Elections
S14
|Code interpreter abuse
-
When incoming requests match:
Field Operator Value LLM Unsafe topic detected equals True
Expression when using the editor:
(cf.llm.prompt.unsafe_topic_detected)
-
Action: Block
-
When incoming requests match:
Field Operator Value LLM Unsafe topic categories is in
S1: Violent Crimes
S10: Hate
Expression when using the editor:
(any(cf.llm.prompt.unsafe_topic_categories[*] in {"S1" "S10"}))
-
Action: Block
Custom topic detection lets you define your own topics and AI Security for Apps will score each prompt against them. You can then use these scores in custom rules or rate limiting rules to block, challenge, or log matching requests.
This capability uses a zero-shot classification model that evaluates prompts at runtime. No model training is required.
- You define a list of up to 20 custom topics via the dashboard or API. Each topic consists of:
- A label — Used in rule expressions and analytics
- A topic string — The descriptive text the model uses to classify prompts
- When a request arrives at a
cf-llmlabeled endpoint, the model evaluates the prompt against all defined topic strings and returns a relevance score for each.
- Scores are written to the
cf.llm.prompt.custom_topic_categoriesmap field, keyed by label. You use labels — not topic strings — in rule expressions and analytics.
Scores follow the same convention as other AI Security for Apps scores, where lower values indicate higher relevance (
1 = highly relevant,
99 = not relevant).
-
In the Cloudflare dashboard, go to the Security Settings page.Go to Settings
-
Under AI Security for Apps, find the Custom Topics section and select Manage topics.
-
Add a topic by providing:
- Label: A short identifier used in rule expressions (for example,
competitors).
- Topic: A descriptive English text string the model uses for classification (for example,
asking about Acme Corp products and pricing).
- Label: A short identifier used in rule expressions (for example,
-
Select Save.
Update your custom topics list using a
PUT request:
To retrieve your current topics use a
GET request:
|Parameter
|Limit
|Maximum number of topics
|20
|Topic string length
|2–50 printable ASCII characters
|Label length
|2–20 characters
|Label format
|Lowercase letters, numbers, and hyphens (
-) only
-
When incoming requests match:
Enter the following expression in the editor:
(cf.llm.prompt.custom_topic_categories["competitors"] lt 30)
-
Action: Block
-
When incoming requests match:
Enter the following expression in the editor:
(cf.llm.prompt.custom_topic_categories["finance"] lt 40)
-
Action: Log
Example expression:
(cf.llm.prompt.custom_topic_categories["competitors"] lt 30 or cf.llm.prompt.pii_detected)
The quality of custom topic detection depends on how you write your topic strings. The underlying model is a zero-shot classifier — it compares the semantic meaning of the prompt against your topic string.
Overly broad topics match too many prompts (high false positives). Overly narrow topics miss relevant prompts (high false negatives).
|Quality
|Topic string
|Why
|Good
Acme Corp products and pricing
|Names a specific competitor — catches prompts discussing that company's offerings.
|Good
securities trading and investment recommendations
|Targets a well-defined intersection of two concepts.
|Too narrow
Acme Corp pricing page URL
|So specific that only near-exact mentions will score highly.
|Too broad
technology
|Will match almost any technical prompt.
|Too broad
bad things
|Semantically vague — the model cannot determine what you consider bad.
A topic string like
finance is less effective than
securities trading and investment recommendations. More descriptive phrases give the model better signal and help prevent false positives.
If you define topics that mean nearly the same thing — for example,
financial advice and
investment guidance — both will score similarly on the same prompts, consuming two of your 20-topic budget without adding detection value. Consolidate overlapping concepts into a single topic.
The model performs semantic classification, not keyword matching. A topic string of
Acme Corp products and pricing will detect requests that discuss that competitor's offerings even if they do not mention the company by name — for example, a prompt like "How does your pricing compare to the leading alternative?" can still score highly.
This also means you should phrase topics as action-oriented verb phrases that capture what the user is doing, not just the subject they mention. Descriptions that capture intent are significantly more discriminating — especially on borderline or ambiguous text.
For example, compare these two topic strings against two very different prompts:
|Topic string
|"I read an article about tax deductions"
|"What stocks should I buy to retire in 10 years?"
financial advice
|Medium relevance (false positive)
|High relevance
asking for financial advice
|No relevance (correct)
|High relevance
The noun-phrase version (
financial advice) returns a false positive on the passive text because the prompt merely mentions the subject. The verb-phrase version (
asking for financial advice) correctly ignores passive mentions and only matches when the user is actively seeking advice.
Recommended phrasing styles:
|Style
|Example
|Noun phrase
investment advice
|Verb phrase (recommended)
asking for investment advice
|Sentence-like
a user seeking financial guidance
For most use cases, a 3–6 word verb phrase is the best trade-off between precision and coverage.
After defining your topics, send test prompts and review the scores in Security Analytics. There are two ways to tune detection behavior:
- Adjust the topic string. If a topic is matching too broadly, make the topic string more specific. If it is not matching requests you expect it to catch, broaden or rephrase the topic string.
- Adjust the score threshold in your rule. A lower threshold (for example,
lt 20) is stricter and only matches highly relevant requests. A higher threshold (for example,
lt 50) is more permissive and catches a wider range of related requests. Start with a moderate threshold and refine based on what you observe in logs.
|Label
|Topic string
|Use case
competitors
asking about Acme Corp products and pricing
|Prevent your chatbot from discussing a specific rival's offerings
legal-advice
asking for legal counsel or regulatory compliance guidance
|Block prompts that solicit legal advice from your AI
student-data
requesting student personal information or academic records
|EdTech — prevent discussion of individual student data
exec-internal
discussing internal executive decisions or leadership changes
|Prevent discussion of sensitive internal matters
crypto-advice
asking for cryptocurrency trading or investment recommendations
|FinTech — block prompts seeking crypto investment tips