Policies and guardrails
Learn how to set up and manage policies and guardrails for your AI application
Policies are a crucial component in managing and securing your AI application. They allow you to define rules and guardrails that govern the behavior of your AI model.
Understanding Policies
A policy is a set of rules that are applied to a target and evaluated (returning fail
or pass
) while your AI application runs.
Each policy consists of one or more rules, which are evaluated to ensure compliance with your defined standards, protect against security vulnerabilities, etc.
The target are the message roles that you already have from any LLM provider. e.g., {'role': 'user', 'content': 'here is an user input'}
, being user and assistant the same roles from OpenAI standard, and context being any additional context you pass to the LLM, e.g., information retrieved context or user/session specific contexts.
Policy Structure
A policy is defined using the following structure:
- id: Unique ID for the policy.
- definition: A short description.
- rules: A
list
of rules that are applied to the policy see rules catalog.- type: Supported rule types are
classifier
,rubric
,jailbreak
,pii
,contains
,regex
,similarity
andfactuality
. For model based rules, you can specify the revision, e.g.,jailbreak@47ffb2e
, see more in model based rules - expected: If match, it should
fail
orpass
the evaluation. - value: a
string
orarray of strings
value, depending on the rule type, e.g., for classifier it’s the class name, for rubric it’s the criteria, for pii it’s the pii types (phone, email, etc) - threshold: a
float
value, depending on the rule type. e.g., for jailbreak, classifier and pii it’s the model min output score, similarity is the cosine score.
- type: Supported rule types are
- target: Where the policy rules will be applied, which can be either
user
,assistant
,context
orsystem
. It can be a string or an array of strings for more than one target, e.g.,['user', 'context']
. Aliasoutput
forassistant
andinput
for['user', 'context']
.