# Custom scanners

{% hint style="warning" %}
\[DRAFT: WORK IN PROGRESS] This section is a work in progress and should not be considered complete or comprehensive.
{% endhint %}

You can extend the functionality of Vigil by creating and implementing your own `scanner` module.&#x20;

## High-level concepts

A scanner performs some type of analysis on text data (prompts and responses) and updates the results list for that data.&#x20;

Scanners can also access the vector database and embedding functions, as well as be passed options from a Vigil configuration file.

### ScanModel

Scanners are passed text data in the form of a `ScanModel` and can perform analysis on the `prompt, prompt_response,` or both.

Once the task is completed, the scanner should update the `ScanModel.results` list and return the updated `ScanModel`.

```python
class ScanModel(BaseModel):
    prompt: str = ''
    prompt_response: Optional[str] = None
    results: List[Dict[str, Any]] = [
```

## BaseScanner

Scanners must subclass the `BaseScanner`.&#x20;

A scanner must implement an `analyze()` function that accepts a `ScanModel` and `UUID.`&#x20;

The `post_init` function is also available, which is called as a post-initialization hook after a scanner is created. This can be used for any additional steps required to prep the environment for the scanner, such as loading signatures or updating a database.

```python
class BaseScanner(ABC):
    def __init__(self, name: str = '') -> None:
        self.name = name

    @abstractmethod
    def analyze(self, scan_obj: ScanModel, scan_id: UUID = uuid4()) -> ScanModel:
        raise NotImplementedError('This method needs to be overridden in the subclass.')

    def post_init(self):
        """ Optional post-initialization method """
        pass

```

{% hint style="info" %}
The UUID represents the scan action within Vigil dispatch and can be used in log messages or other tracking.
{% endhint %}

### Registry

Vigil dynamically loads scanners that are properly registered using the `Registration.scanner` decorator.&#x20;

Scanners must import the `Registration` class and decorate their classes as seen below.

```python
from vigil.registry import Registration

@Registration.scanner(name='example', requires_config=False, requires_embedding=False, requires_vectordb=False)
class ExampleScanner(BaseScanner):
    def __init__(self):
        pass

```

#### requires\_config

This argument specifies whether the scanner requires any configuration options from the Vigil config file (that you passed to `Vigil.from_config`.&#x20;

If set to `True`, Vigil will look in that config file for a section named `scanner:$name` and pass any key:value options in that section to the registered scanner as keyword arguments.&#x20;

**Config example**

```
[scanner:example]
threshold = 0.5
```

```python
@Registration.scanner(name='example', requires_config=True)
class ExampleScanner(BaseScanner):
    """ Compare the cosine similarity of the prompt and response """
    def __init__(self, threshold: float):
        self.threshold = float(threshold)
```

#### requires\_embedding

This argument determines if the scanner has access to the `Embedder()` class from [`vigil/core/embedding.py`](https://github.com/deadbits/vigil-llm/blob/main/vigil/core/embedding.py). The Embedder class is initialized when `Vigil.from_config()` is called and provides the ability to generate text embeddings using the model specified in the config file.

In  the example below, the Embedder class is passed to the scanner as the `embedder` Callable.

```python
from typing import Callable

@Registration.scanner(name='example', requires_embedding=True)
class ExampleScanner(BaseScanner):
    def __init__(self, embedder: Callable):
        self.embedder = embedder

    def analyze(self, scan_obj: ScanModel, scan_id: uuid.uuid4) -> ScanModel:
        prompt_embedding = self.embedder.generate(scan_obj.prompt)
```

#### requires\_vectordb

This argument determines if the scanner has access to the `VectorDB` class and its functions:

* `add_texts(texts: List[str], metadatas: List[dict])`
* `add_embeddings(texts: List[str], embeddings: List[List], metadatas: List[dict])`
* `query(text: str)`
