Rule Type | Description | Key Attributes | Use Case |
---|---|---|---|
Jailbreak | Prevents malicious instructions | threshold is min model confidence score | Maintain AI system integrity |
Factuality | NLI model to check against provided factual information | value is factual content. threshold is min model confidence score | Prevent false or misleading content |
Rubric | LLM as a judge to check responses against predefined criteria | value is the rubric criteria, threshold is tolerance level | Ensure compliance with complex and specific guidelines |
Classifier | Zero-shot model for classification | value is class name. threshold is min confidence score | Content moderation, topic classification |
Similarity | Measures text similarity | value is content to measure against. threshold is min cosine distance | Detect similar text |
PII | Detects personally identifiable information | value is PII type (e.g., phone, email). threshold is min model confidence score | Data privacy protection |
Rule Type | Description | Key Attributes | Use Case |
---|---|---|---|
Regex | Applies regular expression patterns | value : regex pattern | Identify specific text patterns |
Contains | String match to check for presence of specific keywords | value : keyword to check | Content filtering, keyword detection |