Types of Policy Rules

Model-based Rules

Rule TypeDescriptionKey AttributesUse Case
JailbreakPrevents malicious instructionsthreshold is min model confidence scoreMaintain AI system integrity
FactualityNLI model to check against provided factual informationvalue is factual content. threshold is min model confidence scorePrevent false or misleading content
RubricLLM as a judge to check responses against predefined criteriavalue is the rubric criteria, threshold is tolerance levelEnsure compliance with complex and specific guidelines
ClassifierZero-shot model for classificationvalue is class name. threshold is min confidence scoreContent moderation, topic classification
SimilarityMeasures text similarityvalue is content to measure against. threshold is min cosine distanceDetect similar text
PIIDetects personally identifiable informationvalue is PII type (e.g., phone, email). threshold is min model confidence scoreData privacy protection

Pattern-based Rules

Rule TypeDescriptionKey AttributesUse Case
RegexApplies regular expression patternsvalue: regex patternIdentify specific text patterns
ContainsString match to check for presence of specific keywordsvalue: keyword to checkContent filtering, keyword detection