๐Ÿ”Ž
garak
CtrlK
  • ๐Ÿ‘‹Welcome to garak!
  • Overview
    • ๐Ÿ’กWhat is garak?
    • โœจOur Features
  • LLM scanning basics
    • ๐Ÿ”What is LLM security?
    • ๐Ÿ› ๏ธSetting up
      • ๐Ÿ˜‡Installing garak
      • ๐ŸInstalling the source code
    • ๐Ÿš€Your first scan
    • ๐Ÿ”ฎReading the results
  • Examples
    • โ˜‘๏ธBasic test
    • ๐Ÿ’‰Prompt injection
    • โ˜ข๏ธToxicity generation
    • ๐Ÿ—๏ธJailbreaks
    • ๐Ÿ’ฑEncoding-based bypass
    • ๐Ÿ“ผData leaks & replay
    • ๐ŸคฆFalse reasoning
    • ๐Ÿ›€Automatic soak test
  • garak components
    • ๐Ÿ•ต๏ธโ€โ™€๏ธVulnerability probes
    • ๐ŸฆœUsing generators
    • ๐Ÿ”ŽUnderstanding detectors
    • ๐Ÿ‡Managing it: harnesses
    • ๐Ÿ’ฏScan evaluation
  • Automatic red-teaming
    • ๐Ÿ”ดWhat is red-teaming?
    • ๐ŸคผResponsive auto-prompt
    • ๐Ÿช–garak's auto red-team
    • ๐Ÿž๏ธRed teaming in the wild
  • Going further
    • โ“FAQ
    • ๐Ÿ’Getting help
    • ๐ŸŽฏReporting hits
    • ๐Ÿง‘โ€๐Ÿคโ€๐Ÿง‘Contributing to garak
Powered by GitBook
On this page
  1. garak components

๐Ÿ”ŽUnderstanding detectors

It's not easy to determine when an LLM has gone wrong. Even though this can sometimes be evident to humans, garak's probes often generate tens of thousands of outputs, and so needs automatic detection for language model failures. The detectors in garak serve this purpose. Some look for keywords, others use machine learning classifiers to judge outputs.

PreviousUsing generatorsNextManaging it: harnesses

Last updated 2 years ago