Setting up policies
Create custom policies that best suits your application
Policy are set of rules that can be used as guardrails.
Type | Description |
---|---|
jailbreak | Full name of user |
pii | Reported age |
classifier | Whether the user joined the community |
rubric | Whether the user joined the community |
similar | Whether the user joined the community |
factual | Whether the user joined the community |
contains | Whether the user joined the community |
regex | Whether the user joined the community |
Define your application policies
Policies define the rules and guidelines that your LLM must follow. Here you’ll learn how to create custom policies that best suits your application.
The first thing is determine the type of content that needs to be evaluated. Common policies include identifying personal identifiable information (PII), filtering inappropriate content, avoing a kind of hallucination, ensuring a specific behavior and compliance with business rules.
How to structure a policy
To create your custom policy, you should define the following structure:
- Policy ID: Assign a unique
policy_id
for each policy. This ID will be used when calling the Guard Evaluate API. - Rule: The policy main rule. It’s used to monitor messages and fail when it’s goes against the rule.
- Examples: Examples of outputs that pass or fail the policy. Provided in the
example
parameter. Each example should have atype
being FAIL or PASS.
{
"id": "custom_policy_1",
"rule": "Should not share sensitive payment information - PII like credit card numbers.",
"examples": [
{
"type": "FAIL",
"input_example": "",
"output_example": "Here is a credit card number 1234-1234-1234-1234 and CVV is 123",
}
]
}
Override policy
override_policy
parameter.Policy prioritization
Policies are prioritized based on the order in which they are provided. This means that the Guard Evaluate API will evaluate the content against the first policy in the list before moving on to next policies. If a policy is violated, and correction_enabled
is set to true
, the API will stop the evaluation and provide a correction for the first violated policy.
How to prioritize policies:
- Order list: The order in which you list your
policy_ids
oroverride_policy
in the request determines their priority. The first policy listed will be checked first, followed by the second, and so on. - Impact: In cases where multiple policies could be relevant, the prioritization ensures that the most critical checks are performed first, potentially affecting the final output or the corrections provided.
Evaluation threhold
The parameter threshold
determines the sensitivity of policy enforcement. It is a numeric value, between 0 and 1, and sets the minimum score required for content to pass the evaluation.
How to define a threshold:
- Lower threshold: A lower threshold (closer to
0
) means stricter enforcement. Content must meet a higher standard to pass, allowing less deviation from the policy. - Higher threshold: A higher threshold (closer to
1
) is more lenient, allowing more content to pass even if it partially violates the policies.
threshold
to 0.0
enable an experimental auto-threshold, that choose the best threshold for that specific evaluation.threshold
to 0.5
, and then test ranges between 0.3
and 0.6
.Evaluation steps
Structure the policies
Use the parameter override_policy
and policy_ids
to send the policies to evaluate. Note that the list is prioritized by order. First policy is the fist to be evaluated.
Evaluate each policy against the input and output
The policies are evaluated sequentially.
risk_score = evaluate(LLM_output, policy)
The evaluation stops at the first violated policy.
is_policy_violated = risk_score > threshold
Auto correction
If any policy is violated, a correction will be made on LLM output according to the policy.
[
{
role: "guard",
content: "edited LLM output"
}
]
Get started
There are a level of flexibility to adjust the API for your use case. Please refer to below endpoint for more details.