Canary Tokens
Last updated
Last updated
The canary tokens functionality is not a typical scanner module, instead it is available with two :
/canary/add: Generate a canary token and add it to a prompt
/canary/check: Check if a prompt response contains a canary token
You can use these endpoints in two different detection workflows:
A unique 16 character string is added as a prefix to a prompt within the header:
<-@!-- {canary} --@!->
.
Add a canary token to a prompt
Check if the LLMs response contains the canary
If the response does contain the canary, this may indicate prompt injection designed to leak the initial instructions/prompt
Add a canary token to a prompt with instructions to always include the canary in an LLMs response
Check if the LLMs response contains the canary token
If the response does not contain the canary, this may indicate goal hijacking prompt injection