๐Ÿ—„๏ธLoad Datasets

Load embedding datasets into Chroma

If you don't intend to use the vector database scanner, you can skip this step.

Embeddings are currently available with three models, or you can bring your own dataset.

  • text-embedding-ada-002

  • all-MiniLM-L6-v2

  • all-mpnet-base-v2

If there is a model you'd like to see added, feel free to open a Github Issue.

Run loader

Load the appropriate datasets for your embedding model with the loader.py utility.

Example: OpenAI datasets

python loader.py --conf conf/server.conf --dataset deadbits/vigil-instruction-bypass-ada-002
python loader.py --conf conf/server.conf --dataset deadbits/vigil-jailbreak-ada-002

You can also load your own datasets from Hugging Face Hub as long as you use the columns:

Column
Type

text

string

embeddings

list[float]

model

string

Last updated