Model based revisions are used to specify a specific revision of a model to be used for a particular guardrail or policy.

Usually you can specify the model revision by passing the model_revision in the rule, e.g., jailbreak@47ffb2e.

If not specified, it will use the default revision for the model.

Supported models

Rule TypeLatest Model RevisionLanguage supportTask DescriptionMetric and model card report
Jailbreakjailbreak@47ffb2een, pt, esDetects attempts to bypass AI safety measuresJailbreakGuard-10K

Revisions

ModelRevisionBenchmark
Jailbreakjailbreak@47ffb2e (latest)f1: 0.95@JailbreakGuard-10K

Model metrics

Jailbreak@47ffb2e

MetricScoreDataset
Accuracy0.95JailbreakGuard-10K
Precision0.93JailbreakGuard-10K
Recall0.97JailbreakGuard-10K
F1 Score0.95JailbreakGuard-10K
AUC-ROC0.98JailbreakGuard-10K
False Positive Rate0.03JailbreakGuard-10K
True Negative Rate0.97JailbreakGuard-10K

The jailbreak model was benchmarked using the JailbreakGuard-10K dataset, which consists of 10,000 carefully curated examples of potential jailbreak attempts and safe interactions in English, Portuguese, and Spanish. This dataset covers a wide range of advanced jailbreak techniques and ensures robust performance across various scenarios.