Vector Search Usage¶
embcli
provides a set of commands to perform vector search operations using vector stores.
We assume you have installed the embcli-openai plugin and have an OpenAI API key to go through this tutorial.
pip install embcli-openai
cat .env
OPENAI_API_KEY=<YOUR_OPENAI_KEY>
vector-stores
command¶
emb vector-stores
command lists available vector stores. Currently, it only supports Chroma vector store.
emb vector-stores
ChromaVectorStore
Vendor: chroma
ingest-sample
command¶
emb ingest-sample
command indexes vectors from the specified sample corpus into a vector store collection.
Note: if the model provides options for generating documents-optimized embeddings, those will be implicitly used for ingestion.
For example, for the Gemini embedding models, task_type=retrieval_document
option will be added.
emb ingest-sample --help
Usage: emb ingest-sample [OPTIONS]
Ingest example documents into the vector store.
Options:
-e, --env-file TEXT Path to the .env file
-m, --model TEXT Model id or alias to use for embedding
[required]
-p, --model-path TEXT Path to the local model
--vector-store TEXT Vector store to use for storing embeddings
[default: chroma]
--persist-path TEXT Path to persist the vector store
-c, --collection TEXT Collection name where to store the
embeddings [required]
--corpus [cat-names-en|cat-names-ja|dishes-en|dishes-ja|tourist-spots-en|tourist-spots-ja|movies-en|movies-ja]
Smaple corpus name to use [default: cat-
names-en]
-o, --option <TEXT TEXT>... key/value options for the model
--help Show this message and exit.
--model
option (required)¶
--model
/-m
option specifies the model to use for embedding.
--model-path
option¶
--model-path
/-p
option specifies the path to the local model. This is required for plugins that support local models, such as embcli-llamacpp
.
--collection
option (required)¶
--collection
/-c
option specifies the collection name where to store the embeddings. If the collection does not exist, it will be created.
To index a sample corpus (default is cat-names-en
) in catcafe
collection using text-embedding-3-small
model, run the following command:
emb ingest-sample -m 3-small -c catcafe
The output will show the name of vector store, collection, and the persistent path of the vector store.
Documents ingested successfully.
Vector store: chroma (collection: catcafe)
Persist path: ./chroma
--vector-store
option¶
--vector-store
option specifies the vector store to use for storing embeddings. Currently, it only supports chroma
.
--persist-path
option¶
--persist-path
option specifies the path to persist the vector store. The default is different for each vector store. For Chroma, the default is ./chroma
.
To use a different path from the default, run the following command:
emb ingest-sample -m 3-small -c catcafe --persist-path /path/to/my-vector-store
--corpus
option¶
--corpus
option specifies the sample corpus name to use. The available options are cat-names-en
, cat-names-ja
, dishes-en
, dishes-ja
, tourist-spots-en
, tourist-spots-ja
, movies-en
, and movies-ja
. The default is cat-names-en
.
cat-names-en
: 100 English cat names synthetically generated by an AI model. sourcecat-names-ja
: Japanese translation ofcat-names-en
. sourcedishes-en
: 100 English dish names synthetically generated by an AI model. sourcedishes-ja
: Japanese translation ofdishes-en
. sourcetourist-spots-en
: 100 English tourist spots synthetically generated by an AI model. sourcetourist-spots-ja
: Japanese translation oftourinst-stapo-en
. sourcemovies-en
: 100 English movie titles synthetically generated by an AI model. sourcemovies-ja
: Japanese translation ofmovies-en
. source
To index the sample corpus dishes-en
in a vector store collection named menu
, run the following command:
emb ingest-sample -m 3-small -c menu --corpus dishes-en
--option
option¶
To pass additional options to the model, use the --option
/-o
option. The options are model-specific, so please refer emb models
command for available options for a specific model.
ingest
command¶
Similar to emb ingest-sample
, emb ingest
command indexes vectors from a corpus into a vector store collection but allows you to use your own corpus file.
Note: if the model provides options for generating documents-optimized embeddings, those will be implicitly used for ingestion.
For example, for the Gemini embedding models, task_type=retrieval_document
option will be added.
emb ingest --help
Usage: emb ingest [OPTIONS]
Ingest documents into the vector store. Ingestion-specific embeddings are
used if the model provides options for generating search documents-optimized
embeddings.
Options:
-e, --env-file TEXT Path to the .env file
-m, --model TEXT Model id or alias to use for embedding
[required]
-p, --model-path TEXT Path to the local model
--vector-store TEXT Vector store to use for storing embeddings
[default: chroma]
--persist-path TEXT Path to persist the vector store
-c, --collection TEXT Collection name where to store the embeddings
[required]
-f, --file PATH File containing text to embed [required]
--input-format [csv] Input format of the file [default: csv]
--batch-size INTEGER Batch size for embedding [default: 100]
-o, --option <TEXT TEXT>... key/value options for the model
--help Show this message and exit.
--file
option (required)¶
--file
/-f
option specifies the file containing text to embed. The file should be in CSV format with two columns. The first column is the document ID and the second column is the text to embed.
See provided sample corpus files for reference.
To index a text file named my-corpus.csv
in a vector store collection named mycollection
using text-embedding-3-small
model, run the following command:
emb ingest -m 3-small -c mycollection -f my-corpus.csv
The output will show the name of vector store, collection, and the persistent path of the vector store.
Documents ingested successfully.
Vector store: chroma (collection: mycollection)
Persist path: ./chroma
--input-format
option¶
--input-format
option specifies the input format of the file. Currently, it only supports csv
.
--batch-size
option¶
--batch-size
option specifies the batch size for embedding to reduce the number of API calls. The default is 100
. To use a batch size of 50
, run the following command:
emb ingest -m 3-small -c mycollection -f my-corpus.csv --batch-size 50
Other options¶
For other options common options to ingest-sample
, refer to emb ingest-sample command.
search
command¶
emb search
command searches for documents in a collection for a query.
Note: if the model provides options for generating query-optimized embeddings, those will be implicitly used for searching.
For example, for the Gemini embedding models, task_type=retrieval_query
option will be added.
emb search --help
Usage: emb search [OPTIONS]
Search for documents in the vector store for the query. Query-specific
embedding is used if the model provides options for generating search query-
optimized embeddings.
Options:
-e, --env-file TEXT Path to the .env file
-m, --model TEXT Model id or alias to use for embedding
[required]
-p, --model-path TEXT Path to the local model
--vector-store TEXT Vector store to use for storing embeddings
[default: chroma]
--persist-path TEXT Path to persist the vector store
-c, --collection TEXT Collection name where the embeddings are stored
[required]
-q, --query TEXT Query text to search for
--image PATH Image file to search for
-k, --top-k INTEGER Number of top results to return [default: 5]
-o, --option <TEXT TEXT>... key/value options for the model
--help Show this message and exit.
--model
option (required)¶
--model
/-m
option specifies the model to use for embedding the query text.
Important: Make sure you use the same model that was used to index the collection (unless you use a diffrent model with intention). Otherwise the results may not be accurate.
--model-path
option¶
--model-path
/-p
option specifies the path to the local model. This is required for plugins that support local models, such as embcli-llamacpp
.
--collection
option (required)¶
--collection
/-c
option specifies the collection name where the embeddings are stored. The collection should already exist.
--query
option¶
--query
/-q
option specifies the query text to search for. The query text will be embedded using the specified model.
Assuming you have menu
collection indexed with text-embedding-3-small
moddel. To search for documents in the menu
collection, run the following command:
emb search -m 3-small -c menu -q "May I have some sweets?🍨"
The output will show the top results with their similarity scores:
Found 5 results:
Score: 0.44441199833903255, Document ID: 66, Text: Maple Taffy (Canada): Simple sweet treat hot maple syrup poured onto fresh snow. Cools hardens slightly rolled onto stick. Fun frozen treat.
Score: 0.43905687912956637, Document ID: 100, Text: Trdelník (Czech Republic): Spit cake rolled dough wrapped stick coated sugar nuts baked open fire. Sweet aromatic spiral pastry street stalls.
Score: 0.4234538039730697, Document ID: 35, Text: Baklava (Middle East/Balkans): Rich sweet pastry layers filo dough filled chopped nuts sweetened syrup honey. Flaky crunchy intensely sweet dessert.
Score: 0.42317725372218157, Document ID: 23, Text: Dim Sum (China): Variety of bite sized portions typically served in steamer baskets or small plates with tea. Includes steamed buns dumplings rice rolls savory sweet items. Ideal for social brunch.
Score: 0.4212782104868374, Document ID: 79, Text: Belgian Waffles (Belgium): Known for lightness crisp exterior. Brussels waffles larger rectangular Liege waffles denser caramelized pearl sugar. Served various sweet toppings.
--image
option¶
--image
option specifies the image file to search for. The image will be embedded using the specified model. This is useful for multimodal models that support image inputs. See Multimodal Usage for more details on multimodal models.
--top-k
option¶
--top-k
/-k
option specifies the number of top results to return. The default is 5
. To return top 10 results, run the following command:
emb search -m 3-small -c menu -q "May I have some sweets?🍨" -k 10
--vector-store
option¶
--vector-store
option specifies the vector store to use for storing embeddings. Currently, it only supports chroma
.
--persist-path
option¶
--persist-path
option specifies the path to the vector store to be searched. The default is different for each vector store. For Chroma, the default is ./chroma
.
To search in menu
collection in a different path from the default, run the following command:
emb search -m 3-small -c menu -q "May I have some sweets?🍨" \
--persist-path /path/to/my-vector-store
--option
option¶
To pass additional options to the model, use the --option
/-o
option. The options are model-specific, so please refer emb models
command for available options for a specific model.
collections
command¶
emb collections
command shows the list of collection in a vector store.
emb collections --help
Usage: emb collections [OPTIONS]
List collections in the vector store.
Options:
-e, --env-file TEXT Path to the .env file
--vector-store TEXT Vector store to use for storing embeddings [default:
chroma]
--persist-path TEXT Path to persist the vector store
--help Show this message and exit.
To list collectoins in the default vector store, run the following command:
emb collections
The output will show the list of collection names:
Collections:
- menu
- catcafe
--vector-store
option¶
--vector-store
option specifies the vector store to use for storing embeddings. Currently, it only supports chroma
.
--persist-path
option¶
--persist-path
option specifies the path to the vector store. To list collections in a different path from the default, run the following command:
emb collections --persist-path /path/to/my-vector-store
delete-collection
command¶
emb delete-collection
command deletes a collection from a vector store.
emb delete-collection --help
Usage: emb delete-collection [OPTIONS]
Delete a collection from the vector store.
Options:
-e, --env-file TEXT Path to the .env file
--vector-store TEXT Vector store to use for storing embeddings [default:
chroma]
--persist-path TEXT Path to persist the vector store
-c, --collection TEXT Collection name to delete [required]
--help Show this message and exit.
--collection
option (required)¶
--collection
/-c
option specifies the collection name to delete. The collection should already exist.
To delete a collection named mycollection
in a vector store, run the following command:
emb delete-collection -c mycollection
--vector-store
option¶
--vector-store
option specifies the vector store to use for storing embeddings. Currently, it only supports chroma
.
--persist-path
option¶
--persist-path
option specifies the path to the vector store.