To implement a blocklist for topics, names and keywords, you can combine the classifier, similarity and factuality rules. This combination helps prevent the AI from using or responding with specific topics, names or keywords that are deemed inappropriate or sensitive. For more details, see our Rules Catalog.

Rule structure:

Blocklist by topic classification

  • type: classifier
  • value: List of topics or categories to block
  • expected: fail (to flag when a blocked word is detected)
  • threshold: Confidence level for detection (e.g., 0.9 for 90% confidence)

Blocklist by similarity

  • type: similarity
  • words: List of words or phrases to block
  • expected: fail (to flag when a blocked word is detected)
  • threshold: Confidence level for detection (e.g., 0.9 for 90% confidence)

Blocklist by factual consistency

  • type: factuality
  • value: List of sources or documents to check against
  • expected: fail (to flag when a blocked word is detected)
  • threshold: Confidence level for detection (e.g., 0.9 for 90% confidence)

Create the policy

Here’s an example of a policy to implement a blocklist:

{
  "id": "blocklist-policy",
  "definition": "short description",
  "rules": [
    {
      "type": "classifier",
      "value": "topic A, topic B, topic C",
      "expected": "fail",
      "threshold": 0.9
    },
    {
      "type": "similarity",
      "value": "blocked words, blocked phrases, etc.",
      "expected": "fail",
      "threshold": 0.9
    },
    {
      "type": "similarity",
      "value": "competitor name A, rival name B, opposition name C",
      "expected": "fail",
      "threshold": 0.9
    },
    {
      "type": "factuality",
      "value": "news article, wikipedia, etc.",
      "expected": "fail",
      "threshold": 0.9
    }
  ],
  "target": "both"
}