🕵️♀️Vulnerability probes
The big important part of garak is its big collection of probes. Each probe is designed to detect a single kind of vulnerability. The probes interact directly with the language model, sometimes sending up to thousands of prompts. The language model - represented with in garak with a "generator" - generates output text in response to the probe's prompts.
Probes have complete control of the interaction with the generator, and so can do a lot of different things. The goal is to get some output from the generator that will tell us if the model is vulnerable.
Last updated