๐ŸCustom scanners

[DRAFT: WORK IN PROGRESS]

[DRAFT: WORK IN PROGRESS] This section is a work in progress and should not be considered complete or comprehensive.

You can extend the functionality of Vigil by creating and implementing your own scanner module.

High-level concepts

A scanner performs some type of analysis on text data (prompts and responses) and updates the results list for that data.

Scanners can also access the vector database and embedding functions, as well as be passed options from a Vigil configuration file.

ScanModel

Scanners are passed text data in the form of a ScanModel and can perform analysis on the prompt, prompt_response, or both.

Once the task is completed, the scanner should update the ScanModel.results list and return the updated ScanModel.

class ScanModel(BaseModel):
    prompt: str = ''
    prompt_response: Optional[str] = None
    results: List[Dict[str, Any]] = [

BaseScanner

Scanners must subclass the BaseScanner.

A scanner must implement an analyze() function that accepts a ScanModel and UUID.

The post_init function is also available, which is called as a post-initialization hook after a scanner is created. This can be used for any additional steps required to prep the environment for the scanner, such as loading signatures or updating a database.

class BaseScanner(ABC):
    def __init__(self, name: str = '') -> None:
        self.name = name

    @abstractmethod
    def analyze(self, scan_obj: ScanModel, scan_id: UUID = uuid4()) -> ScanModel:
        raise NotImplementedError('This method needs to be overridden in the subclass.')

    def post_init(self):
        """ Optional post-initialization method """
        pass

The UUID represents the scan action within Vigil dispatch and can be used in log messages or other tracking.

Registry

Vigil dynamically loads scanners that are properly registered using the Registration.scanner decorator.

Scanners must import the Registration class and decorate their classes as seen below.

from vigil.registry import Registration

@Registration.scanner(name='example', requires_config=False, requires_embedding=False, requires_vectordb=False)
class ExampleScanner(BaseScanner):
    def __init__(self):
        pass

requires_config

This argument specifies whether the scanner requires any configuration options from the Vigil config file (that you passed to Vigil.from_config.

If set to True, Vigil will look in that config file for a section named scanner:$name and pass any key:value options in that section to the registered scanner as keyword arguments.

Config example

[scanner:example]
threshold = 0.5
@Registration.scanner(name='example', requires_config=True)
class ExampleScanner(BaseScanner):
    """ Compare the cosine similarity of the prompt and response """
    def __init__(self, threshold: float):
        self.threshold = float(threshold)

requires_embedding

This argument determines if the scanner has access to the Embedder() class from vigil/core/embedding.py. The Embedder class is initialized when Vigil.from_config() is called and provides the ability to generate text embeddings using the model specified in the config file.

In the example below, the Embedder class is passed to the scanner as the embedder Callable.

from typing import Callable

@Registration.scanner(name='example', requires_embedding=True)
class ExampleScanner(BaseScanner):
    def __init__(self, embedder: Callable):
        self.embedder = embedder

    def analyze(self, scan_obj: ScanModel, scan_id: uuid.uuid4) -> ScanModel:
        prompt_embedding = self.embedder.generate(scan_obj.prompt)

requires_vectordb

This argument determines if the scanner has access to the VectorDB class and its functions:

  • add_texts(texts: List[str], metadatas: List[dict])

  • add_embeddings(texts: List[str], embeddings: List[List], metadatas: List[dict])

  • query(text: str)

Last updated