Use cases
Protect brand reputation by blocking harmful content
Policy example to block harmful content using classifier rule
To implement a classifier for topics and names, you can use the classifier
rule. This rule helps prevent the AI from using or responding with specific words or phrases that are deemed inappropriate or sensitive. For more details, see our Rules Catalog.
Rule structure:
- type:
classifier
- value: List of topics to classify, e.g,
"hate speech, harassment, sexual content, self-harm
- expected:
fail
(to flag when a blocked topic is detected) - threshold: Confidence level for detection (e.g., 0.9 for 90% confidence)
Create the policy
Here’s an example of a policy to implement a classifier: