🔎Understanding detectors

It's not easy to determine when an LLM has gone wrong. Even though this can sometimes be evident to humans, garak's probes often generate tens of thousands of outputs, and so needs automatic detection for language model failures. The detectors in garak serve this purpose. Some look for keywords, others use machine learning classifiers to judge outputs.

Last updated