🚀Your first scan

So you've installed garak - great! Let's run a scan to see what's up.

To keep things simple, let's try a straightforward target that could run on most machines - gpt2. Hugging Face Hub has a free and openly available version of this, so let's use that. And let's try a relatively straightforward probe, one that tries to get rude generations from the model.

First we'd like to see the list of probes available. Open a terminal if you don't have one up already, and run garak by entering the following then pressing enter:

python -m garak --list_probes

You should get a list of probes in garak, with symbols at the side of some lines. You can always read more about garak's options by running python -m garak --help (or sometimes just garak --help, depending on your installation).

Back to the list of probes. It might look like this.

garak LLM security probe v0.9.0.6 ( https://github.com/leondz/garak ) at 2023-07-24T11:20:16.086762
probes: art 🌟
probes: art.Tox
probes: continuation 🌟
probes: continuation.ContinueSlursReclaimedSlurs50
probes: dan 🌟
probes: dan.Ablation_Dan_11_0
probes: dan.AntiDAN
probes: dan.ChatGPT_Developer_Mode_RANTI
probes: dan.ChatGPT_Developer_Mode_v2
probes: dan.ChatGPT_Image_Markdown
probes: dan.DAN_Jailbreak
probes: dan.DUDE
probes: dan.Dan_10_0
probes: dan.Dan_11_0
probes: dan.Dan_6_0
probes: dan.Dan_6_2
probes: dan.Dan_7_0
probes: dan.Dan_8_0
probes: dan.Dan_9_0
probes: dan.STAN
probes: encoding 🌟
probes: encoding.InjectAscii85
probes: encoding.InjectBase16
probes: encoding.InjectBase2048
probes: encoding.InjectBase32
probes: encoding.InjectBase64
probes: encoding.InjectBraille
probes: encoding.InjectHex
probes: encoding.InjectMime 💤
probes: encoding.InjectMorse
probes: encoding.InjectQP 💤
probes: encoding.InjectROT13
probes: encoding.InjectUU
probes: glitch 🌟
probes: glitch.Glitch 💤
probes: glitch.Glitch100
probes: goodside 🌟
probes: goodside.ThreatenJSON
probes: goodside.WhoIsRiley
probes: goodside._Davidjl
probes: knownbadsignatures 🌟
probes: knownbadsignatures.EICAR
probes: knownbadsignatures.GTUBE
probes: knownbadsignatures.GTphish
probes: leakreplay 🌟
probes: leakreplay.LiteratureCloze 💤
probes: leakreplay.LiteratureCloze80
probes: leakreplay.LiteratureComplete 💤
probes: leakreplay.LiteratureComplete80
probes: lmrc 🌟
probes: lmrc.Anthropomorphisation
probes: lmrc.Bullying
probes: lmrc.Deadnaming
probes: lmrc.Profanity
probes: lmrc.QuackMedicine
probes: lmrc.SexualContent
probes: lmrc.Sexualisation
probes: lmrc.SlurUsage
probes: malwaregen 🌟
probes: malwaregen.Evasion
probes: malwaregen.Payload
probes: malwaregen.SubFunctions
probes: malwaregen.TopLevel
probes: misleading 🌟
probes: misleading.FalseAssertion50
probes: promptinject 🌟
probes: promptinject.HijackHateHumans 💤
probes: promptinject.HijackHateHumansMini
probes: promptinject.HijackKillHumans 💤
probes: promptinject.HijackKillHumansMini
probes: promptinject.HijackLongPrompt 💤
probes: promptinject.HijackLongPromptMini
probes: realtoxicityprompts 🌟
probes: realtoxicityprompts.RTPBlank
probes: realtoxicityprompts.RTPFlirtation
probes: realtoxicityprompts.RTPIdentity_Attack
probes: realtoxicityprompts.RTPInsult
probes: realtoxicityprompts.RTPProfanity
probes: realtoxicityprompts.RTPSevere_Toxicity
probes: realtoxicityprompts.RTPSexually_Explicit
probes: realtoxicityprompts.RTPThreat
probes: snowball 🌟
probes: snowball.GraphConnectivity 💤
probes: snowball.GraphConnectivityMini
probes: snowball.Primes 💤
probes: snowball.PrimesMini
probes: snowball.Senators 💤
probes: snowball.SenatorsMini
probes: test 🌟
probes: test.Blank 💤
probes: xss 🌟
probes: xss.MarkdownImageExfil

That's good! The stars 🌟 indicate a whole plugin; if we also garak to run them, it will run all the probes in that category. Except the disabled ones, marked with 💤. We can run disabled probes my naming them directly when we start a garak run, but they won't be selected automatically.

Under lmrc we can see a probe named "lmrc.Profanity". Let's try this one. LMRC stands for "Language Model Risk Cards", and the probes in here come from a framework for assessing language model deployments. But for now let's just run the profanity probe.

To recap - we'll run a Hugging Face model, called gpt2, and use the lmrc.Profanity probe. So, our command line is:

python -m garak --model_type huggingface --model_name gpt2 --probes lmrc.Profanity

Typing that in and pressing enter should start the garak run! It will download gpt2 if you don't have it already - there'll be some progress bars about that - and then it will start the scan.

If all goes well, you should see a progress bar and then a number of lines each saying PASS or FAIL. Congratulations! You're run your first garak scan. The next section covers how to read these results.

Last updated