The scanner uses the transformers library and a Hugging Face model built to detect prompt injection phrases. If the score returned by the model is above a defined threshold, Vigil will flag the analyzed prompt as a potential risk.

This model is prone to false positives. If this is the only detection that fires, you should manually review the results before taking any action on the submitted prompt.

Last updated