Policy Rules Catalog

Types of Policy Rules

Rule Type	Description	Key Attributes	Use Case
Jailbreak	Prevents malicious instructions	`threshold` is min model confidence score	Maintain AI system integrity
Factuality	NLI model to check against provided factual information	`value` is factual content. `threshold` is min model confidence score	Prevent false or misleading content
Rubric	LLM as a judge to check responses against predefined criteria	`value` is the rubric criteria, `threshold` is tolerance level	Ensure compliance with complex and specific guidelines
Classifier	Zero-shot model for classification	`value` is class name. `threshold` is min confidence score	Content moderation, topic classification
Similarity	Measures text similarity	`value` is content to measure against. `threshold` is min cosine distance	Detect similar text
PII	Detects personally identifiable information	`value` is PII type (e.g., phone, email). `threshold` is min model confidence score	Data privacy protection

Rule Type	Description	Key Attributes	Use Case
Regex	Applies regular expression patterns	`value`: regex pattern	Identify specific text patterns
Contains	String match to check for presence of specific keywords	`value`: keyword to check	Content filtering, keyword detection