To detect hallucinations in AI-generated content, you can use the metric.uncertainty rule, a semantic entropy based metric that generate different responses for the same input, and compute the entropy of the responses.

Rule structure:

  • type: metric.uncertainty
  • expected: fail (to flag when the uncertainty is high)
  • threshold: Confidence level for uncertainty detection (e.g., 0.8 for 80% confidence)

Required input

messages:
- role: system
  content: "You are a helpful assistant."
- role: user
  content: "How is relation of B and D?"
- role: assistant
  content: "D is the same level as B."

We noticed this works better when the system prompt is provided, and the model, and other relevant model parameters is set up on application level.

Create the policy

Here’s an example of a policy to detect hallucinations:

{
  "id": "unique policy id",
  "definition": "short description",
  "rules": [
    {
      "type": "metric.uncertainty",
      "expected": "fail",
      "threshold": 0.9
    }
  ],
  "target": "output"
}