Vector Search Usage

embcli provides a set of commands to perform vector search operations using vector stores.

We assume you have installed the embcli-openai plugin and have an OpenAI API key to go through this tutorial.

pip install embcli-openai
cat .env
OPENAI_API_KEY=<YOUR_OPENAI_KEY>

vector-stores command

emb vector-stores command lists available vector stores. Currently, it only supports Chroma vector store.

emb vector-stores
ChromaVectorStore
    Vendor: chroma

ingest-sample command

emb ingest-sample command indexes vectors from the specified sample corpus into a vector store collection.

Note: if the model provides options for generating documents-optimized embeddings, those will be implicitly used for ingestion. For example, for the Gemini embedding models, task_type=retrieval_document option will be added.

emb ingest-sample --help
Usage: emb ingest-sample [OPTIONS]

  Ingest example documents into the vector store.

Options:
  -e, --env-file TEXT             Path to the .env file
  -m, --model TEXT                Model id or alias to use for embedding
                                  [required]
  -p, --model-path TEXT           Path to the local model
  --vector-store TEXT             Vector store to use for storing embeddings
                                  [default: chroma]
  --persist-path TEXT             Path to persist the vector store
  -c, --collection TEXT           Collection name where to store the
                                  embeddings  [required]
  --corpus [cat-names-en|cat-names-ja|dishes-en|dishes-ja|tourist-spots-en|tourist-spots-ja|movies-en|movies-ja]
                                  Smaple corpus name to use  [default: cat-
                                  names-en]
  -o, --option <TEXT TEXT>...     key/value options for the model
  --help                          Show this message and exit.

--model option (required)

--model/-m option specifies the model to use for embedding.

--model-path option

--model-path/-p option specifies the path to the local model. This is required for plugins that support local models, such as embcli-llamacpp.

--collection option (required)

--collection/-c option specifies the collection name where to store the embeddings. If the collection does not exist, it will be created.

To index a sample corpus (default is cat-names-en) in catcafe collection using text-embedding-3-small model, run the following command:

emb ingest-sample -m 3-small -c catcafe

The output will show the name of vector store, collection, and the persistent path of the vector store.

Documents ingested successfully.
Vector store: chroma (collection: catcafe)
Persist path: ./chroma

--vector-store option

--vector-store option specifies the vector store to use for storing embeddings. Currently, it only supports chroma.

--persist-path option

--persist-path option specifies the path to persist the vector store. The default is different for each vector store. For Chroma, the default is ./chroma.

To use a different path from the default, run the following command:

emb ingest-sample -m 3-small -c catcafe --persist-path /path/to/my-vector-store

--corpus option

--corpus option specifies the sample corpus name to use. The available options are cat-names-en, cat-names-ja, dishes-en, dishes-ja, tourist-spots-en, tourist-spots-ja, movies-en, and movies-ja. The default is cat-names-en.

To index the sample corpus dishes-en in a vector store collection named menu, run the following command:

emb ingest-sample -m 3-small -c menu --corpus dishes-en

--option option

To pass additional options to the model, use the --option/-o option. The options are model-specific, so please refer emb models command for available options for a specific model.

ingest command

Similar to emb ingest-sample, emb ingest command indexes vectors from a corpus into a vector store collection but allows you to use your own corpus file.

Note: if the model provides options for generating documents-optimized embeddings, those will be implicitly used for ingestion. For example, for the Gemini embedding models, task_type=retrieval_document option will be added.

emb ingest --help
Usage: emb ingest [OPTIONS]

  Ingest documents into the vector store. Ingestion-specific embeddings are
  used if the model provides options for generating search documents-optimized
  embeddings.

Options:
  -e, --env-file TEXT          Path to the .env file
  -m, --model TEXT             Model id or alias to use for embedding
                               [required]
  -p, --model-path TEXT        Path to the local model
  --vector-store TEXT          Vector store to use for storing embeddings
                               [default: chroma]
  --persist-path TEXT          Path to persist the vector store
  -c, --collection TEXT        Collection name where to store the embeddings
                               [required]
  -f, --file PATH              File containing text to embed  [required]
  --input-format [csv]         Input format of the file  [default: csv]
  --batch-size INTEGER         Batch size for embedding  [default: 100]
  -o, --option <TEXT TEXT>...  key/value options for the model
  --help                       Show this message and exit.

--file option (required)

--file/-f option specifies the file containing text to embed. The file should be in CSV format with two columns. The first column is the document ID and the second column is the text to embed. See provided sample corpus files for reference.

To index a text file named my-corpus.csv in a vector store collection named mycollection using text-embedding-3-small model, run the following command:

emb ingest -m 3-small -c mycollection -f my-corpus.csv

The output will show the name of vector store, collection, and the persistent path of the vector store.

Documents ingested successfully.
Vector store: chroma (collection: mycollection)
Persist path: ./chroma

--input-format option

--input-format option specifies the input format of the file. Currently, it only supports csv.

--batch-size option

--batch-size option specifies the batch size for embedding to reduce the number of API calls. The default is 100. To use a batch size of 50, run the following command:

emb ingest -m 3-small -c mycollection -f my-corpus.csv --batch-size 50

Other options

For other options common options to ingest-sample, refer to emb ingest-sample command.

search command

emb search command searches for documents in a collection for a query.

Note: if the model provides options for generating query-optimized embeddings, those will be implicitly used for searching. For example, for the Gemini embedding models, task_type=retrieval_query option will be added.

emb search --help
Usage: emb search [OPTIONS]

  Search for documents in the vector store for the query. Query-specific
  embedding is used if the model provides options for generating search query-
  optimized embeddings.

Options:
  -e, --env-file TEXT          Path to the .env file
  -m, --model TEXT             Model id or alias to use for embedding
                               [required]
  -p, --model-path TEXT        Path to the local model
  --vector-store TEXT          Vector store to use for storing embeddings
                               [default: chroma]
  --persist-path TEXT          Path to persist the vector store
  -c, --collection TEXT        Collection name where the embeddings are stored
                               [required]
  -q, --query TEXT             Query text to search for
  --image PATH                 Image file to search for
  -k, --top-k INTEGER          Number of top results to return  [default: 5]
  -o, --option <TEXT TEXT>...  key/value options for the model
  --help                       Show this message and exit.

--model option (required)

--model/-m option specifies the model to use for embedding the query text.

Important: Make sure you use the same model that was used to index the collection (unless you use a diffrent model with intention). Otherwise the results may not be accurate.

--model-path option

--model-path/-p option specifies the path to the local model. This is required for plugins that support local models, such as embcli-llamacpp.

--collection option (required)

--collection/-c option specifies the collection name where the embeddings are stored. The collection should already exist.

--query option

--query/-q option specifies the query text to search for. The query text will be embedded using the specified model.

Assuming you have menu collection indexed with text-embedding-3-small moddel. To search for documents in the menu collection, run the following command:

emb search -m 3-small -c menu -q "May I have some sweets?🍨"

The output will show the top results with their similarity scores:

Found 5 results:
Score: 0.44441199833903255, Document ID: 66, Text: Maple Taffy (Canada): Simple sweet treat hot maple syrup poured onto fresh snow. Cools hardens slightly rolled onto stick. Fun frozen treat.
Score: 0.43905687912956637, Document ID: 100, Text: Trdelník (Czech Republic): Spit cake rolled dough wrapped stick coated sugar nuts baked open fire. Sweet aromatic spiral pastry street stalls.
Score: 0.4234538039730697, Document ID: 35, Text: Baklava (Middle East/Balkans): Rich sweet pastry layers filo dough filled chopped nuts sweetened syrup honey. Flaky crunchy intensely sweet dessert.
Score: 0.42317725372218157, Document ID: 23, Text: Dim Sum (China): Variety of bite sized portions typically served in steamer baskets or small plates with tea. Includes steamed buns dumplings rice rolls savory sweet items. Ideal for social brunch.
Score: 0.4212782104868374, Document ID: 79, Text: Belgian Waffles (Belgium): Known for lightness crisp exterior. Brussels waffles larger rectangular Liege waffles denser caramelized pearl sugar. Served various sweet toppings.

--image option

--image option specifies the image file to search for. The image will be embedded using the specified model. This is useful for multimodal models that support image inputs. See Multimodal Usage for more details on multimodal models.

--top-k option

--top-k/-k option specifies the number of top results to return. The default is 5. To return top 10 results, run the following command:

emb search -m 3-small -c menu -q "May I have some sweets?🍨" -k 10

--vector-store option

--vector-store option specifies the vector store to use for storing embeddings. Currently, it only supports chroma.

--persist-path option

--persist-path option specifies the path to the vector store to be searched. The default is different for each vector store. For Chroma, the default is ./chroma.

To search in menu collection in a different path from the default, run the following command:

emb search -m 3-small -c menu -q "May I have some sweets?🍨" \
--persist-path /path/to/my-vector-store

--option option

To pass additional options to the model, use the --option/-o option. The options are model-specific, so please refer emb models command for available options for a specific model.

collections command

emb collections command shows the list of collection in a vector store.

emb collections --help
Usage: emb collections [OPTIONS]

  List collections in the vector store.

Options:
  -e, --env-file TEXT  Path to the .env file
  --vector-store TEXT  Vector store to use for storing embeddings  [default:
                       chroma]
  --persist-path TEXT  Path to persist the vector store
  --help               Show this message and exit.

To list collectoins in the default vector store, run the following command:

emb collections

The output will show the list of collection names:

Collections:
- menu
- catcafe

--vector-store option

--vector-store option specifies the vector store to use for storing embeddings. Currently, it only supports chroma.

--persist-path option

--persist-path option specifies the path to the vector store. To list collections in a different path from the default, run the following command:

emb collections --persist-path /path/to/my-vector-store

delete-collection command

emb delete-collection command deletes a collection from a vector store.

emb delete-collection --help
Usage: emb delete-collection [OPTIONS]

  Delete a collection from the vector store.

Options:
  -e, --env-file TEXT    Path to the .env file
  --vector-store TEXT    Vector store to use for storing embeddings  [default:
                         chroma]
  --persist-path TEXT    Path to persist the vector store
  -c, --collection TEXT  Collection name to delete  [required]
  --help                 Show this message and exit.

--collection option (required)

--collection/-c option specifies the collection name to delete. The collection should already exist.

To delete a collection named mycollection in a vector store, run the following command:

emb delete-collection -c mycollection

--vector-store option

--vector-store option specifies the vector store to use for storing embeddings. Currently, it only supports chroma.

--persist-path option

--persist-path option specifies the path to the vector store.