Unsafe and custom topic detection

AI Security for Apps (formerly Firewall for AI) can detect when an LLM prompt touches on unsafe or unwanted subjects. There are two layers of topic detection:

Default unsafe topics — A built-in set of safety categories that detect harmful content such as violent crimes, hate speech, and sexual content.
Custom topics — Topics you define to match your organization's specific policies, such as "competitors" or "financial advice".

Default unsafe topics

When AI Security for Apps is enabled, it automatically evaluates prompts against a set of default unsafe topic categories and populates two fields:

LLM Unsafe topic detected (cf.llm.prompt.unsafe_topic_detected) — true if any unsafe topic was found.
LLM Unsafe topic categories (cf.llm.prompt.unsafe_topic_categories) — An array of the specific categories detected.

Default unsafe topic categories

Category	Description
`S1`	Violent crimes
`S2`	Non-violent crimes
`S3`	Sex-related crimes
`S4`	Child sexual exploitation
`S5`	Defamation
`S6`	Specialized advice
`S7`	Privacy
`S8`	Intellectual property
`S9`	Indiscriminate weapons
`S10`	Hate
`S11`	Suicide and self-harm
`S12`	Sexual content
`S13`	Elections
`S14`	Code interpreter abuse

Example rules — default unsafe topics

Block any prompt with unsafe content

When incoming requests match:

Field Operator Value
LLM Unsafe topic detected equals True

Expression when using the editor:
(cf.llm.prompt.unsafe_topic_detected)
Action: Block

Field	Operator	Value
LLM Unsafe topic detected	equals	True

Block only specific unsafe categories

When incoming requests match:

Field Operator Value
LLM Unsafe topic categories is in S1: Violent Crimes S10: Hate

Expression when using the editor:
(any(cf.llm.prompt.unsafe_topic_categories[*] in {"S1" "S10"}))
Action: Block

Field	Operator	Value
LLM Unsafe topic categories	is in	`S1: Violent Crimes` `S10: Hate`

Custom topics

Custom topic detection lets you define your own topics and AI Security for Apps will score each prompt against them. You can then use these scores in custom rules or rate limiting rules to block, challenge, or log matching requests.

This capability uses a zero-shot classification model that evaluates prompts at runtime. No model training is required.

How custom topics work

You define a list of up to 20 custom topics via the dashboard or API. Each topic consists of:
- A label — Used in rule expressions and analytics
- A topic string — The descriptive text the model uses to classify prompts
When a request arrives at a cf-llm labeled endpoint, the model evaluates the prompt against all defined topic strings and returns a relevance score for each.
Scores are written to the cf.llm.prompt.custom_topic_categories map field, keyed by label. You use labels — not topic strings — in rule expressions and analytics.

Scores follow the same convention as other AI Security for Apps scores, where lower values indicate higher relevance (1 = highly relevant, 99 = not relevant).

In the Cloudflare dashboard, go to the Security Settings page.
Go to Settings
Under AI Security for Apps, find the Custom Topics section and select Manage topics.
Add a topic by providing:
- Label: A short identifier used in rule expressions (for example, competitors).
- Topic: A descriptive English text string the model uses for classification (for example, asking about Acme Corp products and pricing).
Select Save.

Update your custom topics list using a PUT request:

curl "https://api.cloudflare.com/client/v4/zones/$ZONE_ID/firewall-for-ai/custom_topics" \
--request PUT \
--header "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \
--json '{
  "topics": [
    { "label": "competitors", "topic": "competitor products and services" },
    { "label": "finance", "topic": "financial advice and investment recommendations" },
    { "label": "hr-internal", "topic": "internal HR policies and employee matters" }
  ]
}'

To retrieve your current topics use a GET request:

curl "https://api.cloudflare.com/client/v4/zones/$ZONE_ID/firewall-for-ai/custom_topics" \
--header "Authorization: Bearer $CLOUDFLARE_API_TOKEN"

Constraints

Parameter	Limit
Maximum number of topics	20
Topic string length	2–50 printable ASCII characters
Label length	2–20 characters
Label format	Lowercase letters, numbers, and hyphens (`-`) only

Example rules — custom topics

Block prompts highly relevant to competitors

When incoming requests match:

Enter the following expression in the editor:
(cf.llm.prompt.custom_topic_categories["competitors"] lt 30)
Action: Block

When incoming requests match:

Enter the following expression in the editor:
(cf.llm.prompt.custom_topic_categories["finance"] lt 40)
Action: Log

Combine custom topics with other detections

Example expression:
(cf.llm.prompt.custom_topic_categories["competitors"] lt 30 or cf.llm.prompt.pii_detected)

Best practices for defining custom topics

The quality of custom topic detection depends on how you write your topic strings. The underlying model is a zero-shot classifier — it compares the semantic meaning of the prompt against your topic string.

Be specific and avoid vague topics

Overly broad topics match too many prompts (high false positives). Overly narrow topics miss relevant prompts (high false negatives).

Quality	Topic string	Why
Good	`Acme Corp products and pricing`	Names a specific competitor — catches prompts discussing that company's offerings.
Good	`securities trading and investment recommendations`	Targets a well-defined intersection of two concepts.
Too narrow	`Acme Corp pricing page URL`	So specific that only near-exact mentions will score highly.
Too broad	`technology`	Will match almost any technical prompt.
Too broad	`bad things`	Semantically vague — the model cannot determine what you consider bad.

Use descriptive phrases instead of single keywords

A topic string like finance is less effective than securities trading and investment recommendations. More descriptive phrases give the model better signal and help prevent false positives.

Avoid semantically overlapping topics

If you define topics that mean nearly the same thing — for example, financial advice and investment guidance — both will score similarly on the same prompts, consuming two of your 20-topic budget without adding detection value. Consolidate overlapping concepts into a single topic.

Think about intent and not just keywords

The model performs semantic classification, not keyword matching. A topic string of Acme Corp products and pricing will detect requests that discuss that competitor's offerings even if they do not mention the company by name — for example, a prompt like "How does your pricing compare to the leading alternative?" can still score highly.

This also means you should phrase topics as action-oriented verb phrases that capture what the user is doing, not just the subject they mention. Descriptions that capture intent are significantly more discriminating — especially on borderline or ambiguous text.

For example, compare these two topic strings against two very different prompts:

Topic string	"I read an article about tax deductions"	"What stocks should I buy to retire in 10 years?"
`financial advice`	Medium relevance (false positive)	High relevance
`asking for financial advice`	No relevance (correct)	High relevance

The noun-phrase version (financial advice) returns a false positive on the passive text because the prompt merely mentions the subject. The verb-phrase version (asking for financial advice) correctly ignores passive mentions and only matches when the user is actively seeking advice.

Recommended phrasing styles:

Style	Example
Noun phrase	`investment advice`
Verb phrase (recommended)	`asking for investment advice`
Sentence-like	`a user seeking financial guidance`

For most use cases, a 3–6 word verb phrase is the best trade-off between precision and coverage.

Test and iterate

After defining your topics, send test prompts and review the scores in Security Analytics. There are two ways to tune detection behavior:

Adjust the topic string. If a topic is matching too broadly, make the topic string more specific. If it is not matching requests you expect it to catch, broaden or rephrase the topic string.
Adjust the score threshold in your rule. A lower threshold (for example, lt 20) is stricter and only matches highly relevant requests. A higher threshold (for example, lt 50) is more permissive and catches a wider range of related requests. Start with a moderate threshold and refine based on what you observe in logs.

Example custom topics

Label	Topic string	Use case
`competitors`	`asking about Acme Corp products and pricing`	Prevent your chatbot from discussing a specific rival's offerings
`legal-advice`	`asking for legal counsel or regulatory compliance guidance`	Block prompts that solicit legal advice from your AI
`student-data`	`requesting student personal information or academic records`	EdTech — prevent discussion of individual student data
`exec-internal`	`discussing internal executive decisions or leadership changes`	Prevent discussion of sensitive internal matters
`crypto-advice`	`asking for cryptocurrency trading or investment recommendations`	FinTech — block prompts seeking crypto investment tips