Attack Generation

  • atkgen.Tox

Anti-Virus and Spam Scanning

  • av_spam_scanning.EICAR
  • av_spam_scanning.GTUBE
  • av_spam_scanning.GTphish

Continuation

  • continuation.ContinueSlursReclaimedSlursMini

DAN (Do Anything Now) Jailbreaks

  • dan.AntiDAN
  • dan.AutoDANCached
  • dan.ChatGPT_Developer_Mode_RANTI
  • dan.ChatGPT_Developer_Mode_v2
  • dan.ChatGPT_Image_Markdown
  • dan.DAN_Jailbreak
  • dan.DUDE
  • dan.DanInTheWildMini
  • dan.Dan_10_0
  • dan.Dan_11_0
  • dan.Dan_6_0
  • dan.Dan_6_2
  • dan.Dan_7_0
  • dan.Dan_8_0
  • dan.Dan_9_0
  • dan.STAN

Do Not Answer

  • donotanswer.DiscriminationExclusionToxicityHatefulOffensive
  • donotanswer.HumanChatbox
  • donotanswer.InformationHazard
  • donotanswer.MaliciousUses
  • donotanswer.MisinformationHarms

Encoding

  • encoding.InjectAscii85
  • encoding.InjectBase16
  • encoding.InjectBase2048
  • encoding.InjectBase32
  • encoding.InjectBase64
  • encoding.InjectBraille
  • encoding.InjectEcoji
  • encoding.InjectHex
  • encoding.InjectMorse
  • encoding.InjectNato
  • encoding.InjectROT13
  • encoding.InjectUU
  • encoding.InjectZalgo

File Formats

  • fileformats.HF_Files

Glitch

  • glitch.Glitch100

Goodside

  • goodside.Davidjl
  • goodside.Tag
  • goodside.ThreatenJSON
  • goodside.WhoIsRiley

Grandma

  • grandma.Slurs
  • grandma.Substances
  • grandma.Win10
  • grandma.Win11

Latent Injection

  • latentinjection.LatentInjectionFactSnippetEiffel
  • latentinjection.LatentInjectionReport
  • latentinjection.LatentInjectionResume
  • latentinjection.LatentInjectionTranslationEnFr
  • latentinjection.LatentInjectionTranslationEnZh

Leak Replay

  • leakreplay.GuardianCloze
  • leakreplay.GuardianComplete
  • leakreplay.LiteratureCloze80
  • leakreplay.LiteratureComplete80
  • leakreplay.NYTCloze
  • leakreplay.NYTComplete
  • leakreplay.PotterCloze
  • leakreplay.PotterComplete

LMRC (Language Model Risk Catalog)

  • lmrc.Anthropomorphisation
  • lmrc.Bullying
  • lmrc.Deadnaming
  • lmrc.Profanity
  • lmrc.QuackMedicine
  • lmrc.SexualContent
  • lmrc.Sexualisation
  • lmrc.SlurUsage

Malware Generation

  • malwaregen.Evasion
  • malwaregen.Payload
  • malwaregen.SubFunctions
  • malwaregen.TopLevel

Misleading

  • misleading.FalseAssertion50

Package Hallucination

  • packagehallucination.JavaScript
  • packagehallucination.Python
  • packagehallucination.Ruby
  • packagehallucination.Rust

Prompt Injection

  • promptinject.HijackHateHumansMini
  • promptinject.HijackKillHumansMini
  • promptinject.HijackLongPromptMini

Real Toxicity Prompts

  • realtoxicityprompts.RTPBlank
  • realtoxicityprompts.RTPFlirtation
  • realtoxicityprompts.RTPIdentity_Attack
  • realtoxicityprompts.RTPInsult
  • realtoxicityprompts.RTPProfanity
  • realtoxicityprompts.RTPSevere_Toxicity
  • realtoxicityprompts.RTPSexually_Explicit
  • realtoxicityprompts.RTPThreat

Replay

  • replay.Repeat

Snowball

  • snowball.GraphConnectivityMini
  • snowball.PrimesMini
  • snowball.SenatorsMini

Suffix

  • suffix.GCGCached

TAP (Token Awareness Probing)

  • tap.TAPCached

Topic

  • topic.WordnetControversial

XSS (Cross-Site Scripting)

  • xss.MarkdownImageExfil

Probe Categories

  • dan
  • realtoxicityprompts
  • tap
  • replay
  • misleading
  • donotanswer
  • test
  • fileformats
  • continuation
  • glitch
  • suffix
  • packagehallucination
  • snowball
  • av_spam_scanning
  • latentinjection
  • encoding
  • xss
  • atkgen
  • malwaregen
  • grandma
  • topic
  • promptinject
  • leakreplay
  • visual_jailbreak
  • lmrc
  • goodside