model_revision in the rule, e.g., jailbreak@47ffb2e.
If not specified, it will use the default revision for the model.
Supported models
| Rule Type | Latest Model Revision | Language support | Task Description | Metric and model card report |
|---|---|---|---|---|
| Jailbreak | jailbreak@47ffb2e | en, pt, es | Detects attempts to bypass AI safety measures | JailbreakGuard-10K |
Revisions
| Model | Revision | Benchmark |
|---|---|---|
| Jailbreak | jailbreak@47ffb2e (latest) | f1: 0.95@JailbreakGuard-10K |
Model metrics
Jailbreak@47ffb2e
| Metric | Score | Dataset |
|---|---|---|
| Accuracy | 0.95 | JailbreakGuard-10K |
| Precision | 0.93 | JailbreakGuard-10K |
| Recall | 0.97 | JailbreakGuard-10K |
| F1 Score | 0.95 | JailbreakGuard-10K |
| AUC-ROC | 0.98 | JailbreakGuard-10K |
| False Positive Rate | 0.03 | JailbreakGuard-10K |
| True Negative Rate | 0.97 | JailbreakGuard-10K |
