Use cases
Detect direct prompt injection
Policy example to detect direct prompt injection using jailbreak rule
To detect prompt injections, you can use the rule jailbreak
, for more details, see our Rules Catalog.
Rule structure:
- type: jailbreak
- expected: fail
- threshold: 0.8
Create the policy
Here is an example of policy to detect injection:
Next steps
- Create the policy by using the application endpoint
- Call the Evaluate API with the messages and policy id
detect-injection
- The API output would be a
status
being fail or pass, and the list of policy violations. You could check if the policy iddetect-injection
is there.