Detect direct prompt injection

On this page

Rule structure:
Create the policy
Next steps

To detect prompt injections, you can use the rule jailbreak, for more details, see our Rules Catalog.

Rule structure:

type: jailbreak
expected: fail
threshold: 0.8

Create the policy

Here is an example of policy to detect injection:


policy = {
    "id": "detect-injection",
    "definition": "...",
    "rules": [
        {
            "type": "jailbreak",
            "expeted": "fail",
            "threshold": 0.9
        }
    ],
    "target": "input"
}

Next steps

Create the policy by using the application endpoint
Call the Evaluate API with the messages and policy id detect-injection
The API output would be a status being fail or pass, and the list of policy violations. You could check if the policy id detect-injection is there.

Block off-topic responses Detect PII exposure

Getting started

Real-time protection

Risk assessment

Manage GenAI applications

Detect direct prompt injection

Rule structure:

Create the policy

Next steps

Getting started

Real-time protection

Risk assessment

Manage GenAI applications

​Rule structure:

​Create the policy

​Next steps

Rule structure:

Create the policy

Next steps