Protect brand reputation by blocking harmful content

Rule structure:
Create the policy

To implement a classifier for topics and names, you can use the classifier rule. This rule helps prevent the AI from using or responding with specific words or phrases that are deemed inappropriate or sensitive. For more details, see our Rules Catalog.

Rule structure:

type: classifier
value: List of topics to classify, e.g, "hate speech, harassment, sexual content, self-harm
expected: fail (to flag when a blocked topic is detected)
threshold: Confidence level for detection (e.g., 0.9 for 90% confidence)

Create the policy

Here’s an example of a policy to implement a classifier:

{
  "id": "unique policy id",
  "definition": "short description",
  "rules": [
    {
      "type": "classifier",
      "value": "hate speech, harassment, sexual content, self-harm",
      "expected": "fail",
      "threshold": 0.9
    }
  ],
  "target": "input"
}

Model based rules Detect hallucinations with uncertainty

⌘I

Getting started

Real-time protection

Risk assessment

Manage GenAI applications

Protect brand reputation by blocking harmful content

Rule structure:

Create the policy

Getting started

Real-time protection

Risk assessment

Manage GenAI applications

​Rule structure:

​Create the policy

Rule structure:

Create the policy