๐Ÿš€Your first scan

So you've installed garak - great! Let's run a scan to see what's up.

To keep things simple, let's try a straightforward target that could run on most machines - gpt2. Hugging Face Hub has a free and openly available version of this, so let's use that. And let's try a relatively straightforward probe, one that tries to get rude generations from the model.

First we'd like to see the list of probes available. Open a terminal if you don't have one up already, and run garak by entering the following then pressing enter:

python -m garak --list_probes

You should get a list of probes in garak, with symbols at the side of some lines. You can always read more about garak's options by running python -m garak --help (or sometimes just garak --help, depending on your installation).

Back to the list of probes. It might look like this.

garak LLM security probe v0.9.0.6 ( https://github.com/leondz/garak ) at 2023-07-24T11:20:16.086762
probes: art ๐ŸŒŸ
probes: art.Tox
probes: continuation ๐ŸŒŸ
probes: continuation.ContinueSlursReclaimedSlurs50
probes: dan ๐ŸŒŸ
probes: dan.Ablation_Dan_11_0
probes: dan.AntiDAN
probes: dan.ChatGPT_Developer_Mode_RANTI
probes: dan.ChatGPT_Developer_Mode_v2
probes: dan.ChatGPT_Image_Markdown
probes: dan.DAN_Jailbreak
probes: dan.DUDE
probes: dan.Dan_10_0
probes: dan.Dan_11_0
probes: dan.Dan_6_0
probes: dan.Dan_6_2
probes: dan.Dan_7_0
probes: dan.Dan_8_0
probes: dan.Dan_9_0
probes: dan.STAN
probes: encoding ๐ŸŒŸ
probes: encoding.InjectAscii85
probes: encoding.InjectBase16
probes: encoding.InjectBase2048
probes: encoding.InjectBase32
probes: encoding.InjectBase64
probes: encoding.InjectBraille
probes: encoding.InjectHex
probes: encoding.InjectMime ๐Ÿ’ค
probes: encoding.InjectMorse
probes: encoding.InjectQP ๐Ÿ’ค
probes: encoding.InjectROT13
probes: encoding.InjectUU
probes: glitch ๐ŸŒŸ
probes: glitch.Glitch ๐Ÿ’ค
probes: glitch.Glitch100
probes: goodside ๐ŸŒŸ
probes: goodside.ThreatenJSON
probes: goodside.WhoIsRiley
probes: goodside._Davidjl
probes: knownbadsignatures ๐ŸŒŸ
probes: knownbadsignatures.EICAR
probes: knownbadsignatures.GTUBE
probes: knownbadsignatures.GTphish
probes: leakreplay ๐ŸŒŸ
probes: leakreplay.LiteratureCloze ๐Ÿ’ค
probes: leakreplay.LiteratureCloze80
probes: leakreplay.LiteratureComplete ๐Ÿ’ค
probes: leakreplay.LiteratureComplete80
probes: lmrc ๐ŸŒŸ
probes: lmrc.Anthropomorphisation
probes: lmrc.Bullying
probes: lmrc.Deadnaming
probes: lmrc.Profanity
probes: lmrc.QuackMedicine
probes: lmrc.SexualContent
probes: lmrc.Sexualisation
probes: lmrc.SlurUsage
probes: malwaregen ๐ŸŒŸ
probes: malwaregen.Evasion
probes: malwaregen.Payload
probes: malwaregen.SubFunctions
probes: malwaregen.TopLevel
probes: misleading ๐ŸŒŸ
probes: misleading.FalseAssertion50
probes: promptinject ๐ŸŒŸ
probes: promptinject.HijackHateHumans ๐Ÿ’ค
probes: promptinject.HijackHateHumansMini
probes: promptinject.HijackKillHumans ๐Ÿ’ค
probes: promptinject.HijackKillHumansMini
probes: promptinject.HijackLongPrompt ๐Ÿ’ค
probes: promptinject.HijackLongPromptMini
probes: realtoxicityprompts ๐ŸŒŸ
probes: realtoxicityprompts.RTPBlank
probes: realtoxicityprompts.RTPFlirtation
probes: realtoxicityprompts.RTPIdentity_Attack
probes: realtoxicityprompts.RTPInsult
probes: realtoxicityprompts.RTPProfanity
probes: realtoxicityprompts.RTPSevere_Toxicity
probes: realtoxicityprompts.RTPSexually_Explicit
probes: realtoxicityprompts.RTPThreat
probes: snowball ๐ŸŒŸ
probes: snowball.GraphConnectivity ๐Ÿ’ค
probes: snowball.GraphConnectivityMini
probes: snowball.Primes ๐Ÿ’ค
probes: snowball.PrimesMini
probes: snowball.Senators ๐Ÿ’ค
probes: snowball.SenatorsMini
probes: test ๐ŸŒŸ
probes: test.Blank ๐Ÿ’ค
probes: xss ๐ŸŒŸ
probes: xss.MarkdownImageExfil

That's good! The stars ๐ŸŒŸ indicate a whole plugin; if we also garak to run them, it will run all the probes in that category. Except the disabled ones, marked with ๐Ÿ’ค. We can run disabled probes my naming them directly when we start a garak run, but they won't be selected automatically.

Under lmrc we can see a probe named "lmrc.Profanity". Let's try this one. LMRC stands for "Language Model Risk Cards", and the probes in here come from a framework for assessing language model deployments. But for now let's just run the profanity probe.

To recap - we'll run a Hugging Face model, called gpt2, and use the lmrc.Profanity probe. So, our command line is:

python -m garak --model_type huggingface --model_name gpt2 --probes lmrc.Profanity

Typing that in and pressing enter should start the garak run! It will download gpt2 if you don't have it already - there'll be some progress bars about that - and then it will start the scan.

If all goes well, you should see a progress bar and then a number of lines each saying PASS or FAIL. Congratulations! You're run your first garak scan. The next section covers how to read these results.

Last updated