Risk assessment
Probes Catalog
Available probes for testing AI vulnerabilities
Attack Generation
- atkgen.Tox
Anti-Virus and Spam Scanning
- av_spam_scanning.EICAR
- av_spam_scanning.GTUBE
- av_spam_scanning.GTphish
Continuation
- continuation.ContinueSlursReclaimedSlursMini
DAN (Do Anything Now) Jailbreaks
- dan.AntiDAN
- dan.AutoDANCached
- dan.ChatGPT_Developer_Mode_RANTI
- dan.ChatGPT_Developer_Mode_v2
- dan.ChatGPT_Image_Markdown
- dan.DAN_Jailbreak
- dan.DUDE
- dan.DanInTheWildMini
- dan.Dan_10_0
- dan.Dan_11_0
- dan.Dan_6_0
- dan.Dan_6_2
- dan.Dan_7_0
- dan.Dan_8_0
- dan.Dan_9_0
- dan.STAN
Do Not Answer
- donotanswer.DiscriminationExclusionToxicityHatefulOffensive
- donotanswer.HumanChatbox
- donotanswer.InformationHazard
- donotanswer.MaliciousUses
- donotanswer.MisinformationHarms
Encoding
- encoding.InjectAscii85
- encoding.InjectBase16
- encoding.InjectBase2048
- encoding.InjectBase32
- encoding.InjectBase64
- encoding.InjectBraille
- encoding.InjectEcoji
- encoding.InjectHex
- encoding.InjectMorse
- encoding.InjectNato
- encoding.InjectROT13
- encoding.InjectUU
- encoding.InjectZalgo
File Formats
- fileformats.HF_Files
Glitch
- glitch.Glitch100
Goodside
- goodside.Davidjl
- goodside.Tag
- goodside.ThreatenJSON
- goodside.WhoIsRiley
Grandma
- grandma.Slurs
- grandma.Substances
- grandma.Win10
- grandma.Win11
Latent Injection
- latentinjection.LatentInjectionFactSnippetEiffel
- latentinjection.LatentInjectionReport
- latentinjection.LatentInjectionResume
- latentinjection.LatentInjectionTranslationEnFr
- latentinjection.LatentInjectionTranslationEnZh
Leak Replay
- leakreplay.GuardianCloze
- leakreplay.GuardianComplete
- leakreplay.LiteratureCloze80
- leakreplay.LiteratureComplete80
- leakreplay.NYTCloze
- leakreplay.NYTComplete
- leakreplay.PotterCloze
- leakreplay.PotterComplete
LMRC (Language Model Risk Catalog)
- lmrc.Anthropomorphisation
- lmrc.Bullying
- lmrc.Deadnaming
- lmrc.Profanity
- lmrc.QuackMedicine
- lmrc.SexualContent
- lmrc.Sexualisation
- lmrc.SlurUsage
Malware Generation
- malwaregen.Evasion
- malwaregen.Payload
- malwaregen.SubFunctions
- malwaregen.TopLevel
Misleading
- misleading.FalseAssertion50
Package Hallucination
- packagehallucination.JavaScript
- packagehallucination.Python
- packagehallucination.Ruby
- packagehallucination.Rust
Prompt Injection
- promptinject.HijackHateHumansMini
- promptinject.HijackKillHumansMini
- promptinject.HijackLongPromptMini
Real Toxicity Prompts
- realtoxicityprompts.RTPBlank
- realtoxicityprompts.RTPFlirtation
- realtoxicityprompts.RTPIdentity_Attack
- realtoxicityprompts.RTPInsult
- realtoxicityprompts.RTPProfanity
- realtoxicityprompts.RTPSevere_Toxicity
- realtoxicityprompts.RTPSexually_Explicit
- realtoxicityprompts.RTPThreat
Replay
- replay.Repeat
Snowball
- snowball.GraphConnectivityMini
- snowball.PrimesMini
- snowball.SenatorsMini
Suffix
- suffix.GCGCached
TAP (Token Awareness Probing)
- tap.TAPCached
Topic
- topic.WordnetControversial
XSS (Cross-Site Scripting)
- xss.MarkdownImageExfil
Probe Categories
- dan
- realtoxicityprompts
- tap
- replay
- misleading
- donotanswer
- test
- fileformats
- continuation
- glitch
- suffix
- packagehallucination
- snowball
- av_spam_scanning
- latentinjection
- encoding
- xss
- atkgen
- malwaregen
- grandma
- topic
- promptinject
- leakreplay
- visual_jailbreak
- lmrc
- goodside