Evaluate

Evaluate API is used to safeguard generative AI applications by evaluating the input and output of the LLM against a set of policies and guardrails.

Parameters

Application: The application to be evaluated, it’s the same as the application id in application setup.
Messages: The messages to be evaluated, it can be user, assistant, or both, optionally you can pass the system message.
Policy IDs: The policy ids to be evaluated, it’s a list of policy ids in the same order as the policies in the application. Your application should have these policies set up (see policies setup) or passed by policies parameter.
Policies: List of policies to be evaluated, see policies setup for more details.
Correction: If a policy is violated and correction_enabled is set to true, the LLM will be corrected by an automatic correction, or a manual override response defined in the policy.
Fail Fast: If fail_fast is set to true, it will stop the evaluation once any policy is violated.

Evaluation Flow

Once the policies set is enabled or passed by policy_ids in API request, it will be checked on every evaluation request against the provided messages (user, assistant, or both). The policies have a priority order, first in the list has the highest priority. If fail_fastis set totrue`, it will stop the evaluation once any policy is violated. Here is the policy flow:

Input request

To create your policies, visit Policies Setup Guide

{
  "application": "my-app",
  "messages": [
    {
      "role": "user",
      "content": "user message"
    },
    {
      "role": "assistant",
      "content": "assistant response"
    }
  ],
  "policy_ids": ["my-policy", "another-policy"],
  "policies": [], # optional, if not passed, it will use the policies in the application
  "correction_enabled": false,
  "fail_fast": true
}

Evaluation response

{
  "object": "eval",
  "time": 2.86278510093689,
  "created": 1725981521,
  "status": "fail",
  "policy_violations": {
    "my-policy": [
      {
        "rule_type": "rubric",
        "expected": "fail",
        "value": "rubric rule criteria",
        "score": 0.2,
        "explanation": "The request does not meet the conditions in the rubric."
      },
      {
        "rule_type": "classifier",
        "expected": "fail",
        "value": "negative",
        "score": 0.9891692399978638,
        "explanation": null
      },
      {
        "rule_type": "contains",
        "expected": "fail",
        "value": "bad",
        "score": 1.0,
        "explanation": null
      }
    ]
  },
  "correction": null
}

Setting up policies

Learn how to setup Evaluate API to your own application policy

Evaluate API

Learn how to use and integrate our API

Getting started

Real-time protection

Risk assessment

Manage GenAI applications

Parameters

Evaluation Flow

Input request

Evaluation response

Setting up policies

Evaluate API

Getting started

Real-time protection

Risk assessment

Manage GenAI applications

​Parameters

​Evaluation Flow

​Input request

​Evaluation response

Setting up policies

Evaluate API

Parameters

Evaluation Flow

Input request

Evaluation response