Policy are set of rules that can be used as guardrails.

TypeDescription
jailbreakFull name of user
piiReported age
classifierWhether the user joined the community
rubricWhether the user joined the community
similarWhether the user joined the community
factualWhether the user joined the community
containsWhether the user joined the community
regexWhether the user joined the community

Define your application policies

Policies define the rules and guidelines that your LLM must follow. Here you’ll learn how to create custom policies that best suits your application.

The first thing is determine the type of content that needs to be evaluated. Common policies include identifying personal identifiable information (PII), filtering inappropriate content, avoing a kind of hallucination, ensuring a specific behavior and compliance with business rules.

How to structure a policy

To create your custom policy, you should define the following structure:

  • Policy ID: Assign a unique policy_id for each policy. This ID will be used when calling the Guard Evaluate API.
  • Rule: The policy main rule. It’s used to monitor messages and fail when it’s goes against the rule.
  • Examples: Examples of outputs that pass or fail the policy. Provided in the example parameter. Each example should have a type being FAIL or PASS.
We noted that this works as a regular prompt. For better results, try to describe the rule as much as possible, and combining examples as well.
Custom policy example
{
  "id": "custom_policy_1",
  "rule": "Should not share sensitive payment information - PII like credit card numbers.",
  "examples": [
    {
      "type": "FAIL",
      "input_example": "",
      "output_example": "Here is a credit card number 1234-1234-1234-1234 and CVV is 123",
      
    }
  ]
}

Override policy

For now, we only accept overrided policies, so you can specify them in the override_policy parameter.

Policy prioritization

Policies are prioritized based on the order in which they are provided. This means that the Guard Evaluate API will evaluate the content against the first policy in the list before moving on to next policies. If a policy is violated, and correction_enabled is set to true, the API will stop the evaluation and provide a correction for the first violated policy.

How to prioritize policies:

  • Order list: The order in which you list your policy_ids or override_policy in the request determines their priority. The first policy listed will be checked first, followed by the second, and so on.
  • Impact: In cases where multiple policies could be relevant, the prioritization ensures that the most critical checks are performed first, potentially affecting the final output or the corrections provided.

Evaluation threhold

The parameter threshold determines the sensitivity of policy enforcement. It is a numeric value, between 0 and 1, and sets the minimum score required for content to pass the evaluation.

How to define a threshold:

  • Lower threshold: A lower threshold (closer to 0) means stricter enforcement. Content must meet a higher standard to pass, allowing less deviation from the policy.
  • Higher threshold: A higher threshold (closer to 1) is more lenient, allowing more content to pass even if it partially violates the policies.
By setting the threshold to 0.0 enable an experimental auto-threshold, that choose the best threshold for that specific evaluation.
We recommend to start setting the threshold to 0.5, and then test ranges between 0.3 and 0.6.

Evaluation steps

1

Structure the policies

Use the parameter override_policy and policy_ids to send the policies to evaluate. Note that the list is prioritized by order. First policy is the fist to be evaluated.

2

Evaluate each policy against the input and output

The policies are evaluated sequentially.

risk_score = evaluate(LLM_output, policy)

The evaluation stops at the first violated policy.

is_policy_violated = risk_score > threshold
3

Auto correction

If any policy is violated, a correction will be made on LLM output according to the policy.

[
  {
    role: "guard",
    content: "edited LLM output"
  }
]

Get started

There are a level of flexibility to adjust the API for your use case. Please refer to below endpoint for more details.