🕵️‍♀️Vulnerability probes

The big important part of garak is its big collection of probes. Each probe is designed to detect a single kind of vulnerability. The probes interact directly with the language model, sometimes sending up to thousands of prompts. The language model - represented with in garak with a "generator" - generates output text in response to the probe's prompts.

Probes have complete control of the interaction with the generator, and so can do a lot of different things. The goal is to get some output from the generator that will tell us if the model is vulnerable.

Last updated