jailbreak, for more details, see our Rules Catalog.
Rule structure:
- type: jailbreak
- expected: fail
- threshold: 0.8
Create the policy
Here is an example of policy to detect injection:Next steps
- Create the policy by using the application endpoint
- Call the Evaluate API with the messages and policy id
detect-injection - The API output would be a
statusbeing fail or pass, and the list of policy violations. You could check if the policy iddetect-injectionis there.
