Policies are a crucial component in managing and securing your AI application. They allow you to define rules and guardrails that govern the behavior of your AI model.

Understanding Policies

A policy is a set of rules that are applied to a target and evaluated (returning fail or pass) while your AI application runs.

Each policy consists of one or more rules, which are evaluated to ensure compliance with your defined standards, protect against security vulnerabilities, etc.

The target are the message roles that you already have from any LLM provider. e.g., {'role': 'user', 'content': 'here is an user input'}, being user and assistant the same roles from OpenAI standard, and context being any additional context you pass to the LLM, e.g., information retrieved context or user/session specific contexts.

Policy Structure

A policy is defined using the following structure:

  • id: Unique ID for the policy.
  • definition: A short description.
  • rules: A list of rules that are applied to the policy see rules catalog.
    • type: Supported rule types are classifier, rubric, jailbreak, pii, contains,regex,similarity and factuality. For model based rules, you can specify the revision, e.g., jailbreak@47ffb2e, see more in model based rules
    • expected: If match, it should fail or pass the evaluation.
    • value: a string or array of strings value, depending on the rule type, e.g., for classifier it’s the class name, for rubric it’s the criteria, for pii it’s the pii types (phone, email, etc)
    • threshold: a float value, depending on the rule type. e.g., for jailbreak, classifier and pii it’s the model min output score, similarity is the cosine score.
  • target: Where the policy rules will be applied, which can be either user, assistant, context or system. It can be a string or an array of strings for more than one target, e.g., ['user', 'context']. Alias output for assistant and input for ['user', 'context'].

Example Policy


# standard messages where policies are applied depending on the target defined
messages = {
    "role": "system", "content": "system prompt",
    "role": "context", "content": "additional context",
    "role": "user", "content": "user input",
    "role": "assistant", "content": "LLM output",
}

policy1 = {
    "id": "my-moderation-policy",
    "definition": "...",
    "rules": [
        {
            "type": "classifier",
            "value": ["topic A"],
            "expeted": "fail",
            "threshold": 0.9
        }
    ],
    "target": "assistant" # Applies to content from assistant / output
}

policy2 = {
    "id": "my-prompt-injection-policy",
    "definition": "...",
    "rules": [
        {
            "type": "jailbreak",
            "expeted": "fail",
            "threshold": 0.9
        }
    ],
    "target": ["user", "context"] # Applies to content from user and context messages
}

Create policy before evaluating


http.post('/v1/applications/my-app/policies', json=policy)

List your policies

http.get('/v1/applications/my-app/policies')

Update your policy

http.put('/v1/applications/my-app/policies/my-policy-id', json=policy)

Delete your policy

http.delete('/v1/applications/my-app/policies/my-policy-id')