Only run garak on systems that you have permission to use! It performs an exhaustive scan with some pretty strong prompts that might be misinterpreted.
So you've installed garak - great! Let's run a scan to see what's up.
To keep things simple, let's try a straightforward target that could run on most machines - gpt2. Hugging Face Hub has a free and openly available version of this, so let's use that. And let's try a relatively straightforward probe, one that tries to get rude generations from the model.
First we'd like to see the list of probes available. Open a terminal if you don't have one up already, and run garak by entering the following then pressing enter:
python -m garak --list_probes
You should get a list of probes in garak, with symbols at the side of some lines. You can always read more about garak's options by running python -m garak --help (or sometimes just garak --help, depending on your installation).
Back to the list of probes. It might look like this.
That's good! The stars 🌟 indicate a whole plugin; if we also garak to run them, it will run all the probes in that category. Except the disabled ones, marked with 💤. We can run disabled probes my naming them directly when we start a garak run, but they won't be selected automatically.
Under lmrc we can see a probe named "lmrc.Profanity". Let's try this one. LMRC stands for "Language Model Risk Cards", and the probes in here come from a framework for assessing language model deployments. But for now let's just run the profanity probe.
To recap - we'll run a Hugging Face model, called gpt2, and use the lmrc.Profanity probe. So, our command line is:
Typing that in and pressing enter should start the garak run! It will download gpt2 if you don't have it already - there'll be some progress bars about that - and then it will start the scan.
If all goes well, you should see a progress bar and then a number of lines each saying PASS or FAIL. Congratulations! You're run your first garak scan. The next section covers how to read these results.