Vector Search Usage¶
embcli provides a set of commands to perform vector search operations using vector stores.
We assume you have installed the embcli-openai plugin and have an OpenAI API key to go through this tutorial.
pip install embcli-openai
cat .env
OPENAI_API_KEY=<YOUR_OPENAI_KEY>
vector-stores command¶
emb vector-stores command lists available vector stores.
emb vector-stores
LanceDBVectorStore
Vendor: lancedb
See Vector Store Plugins for the full list of available vector stores.
ingest-sample command¶
emb ingest-sample command indexes vectors from the specified sample corpus into a vector store collection.
Note: if the model provides options for generating documents-optimized embeddings, those will be implicitly used for ingestion.
For example, for the Gemini embedding models, task_type=retrieval_document option will be added.
emb ingest-sample --help
Usage: emb ingest-sample [OPTIONS]
Ingest example documents into the vector store.
Options:
-e, --env-file TEXT Path to the .env file
-m, --model TEXT Model id or alias to use for embedding
[required]
-p, --model-path TEXT Path to the local model
--vector-store TEXT Vector store to use for storing embeddings
[default: lancedb]
--persist-path TEXT Path to persist the vector store
-c, --collection TEXT Collection name where to store the
embeddings [required]
--corpus [cat-names-en|cat-names-ja|dishes-en|dishes-ja|tourist-spots-en|tourist-spots-ja|movies-en|movies-ja]
Smaple corpus name to use [default: cat-
names-en]
-o, --option <TEXT TEXT>... key/value options for the model
--help Show this message and exit.
--model option (required)¶
--model/-m option specifies the model to use for embedding.
--model-path option¶
--model-path/-p option specifies the path to the local model. This is required for plugins that support local models, such as embcli-llamacpp.
--collection option (required)¶
--collection/-c option specifies the collection name where to store the embeddings. If the collection does not exist, it will be created.
To index a sample corpus (default is cat-names-en) in catcafe collection using text-embedding-3-small model, run the following command:
emb ingest-sample -m 3-small -c catcafe
The output will show the name of vector store, collection, and the persistent path of the vector store.
Documents ingested successfully.
Vector store: lancedb (collection: catcafe)
Persist path: ./lancedb
--vector-store option¶
--vector-store option specifies the vector store to use for storing embeddings.
--persist-path option¶
--persist-path option specifies the path to persist the vector store. The default is different for each vector store. For LanceDB, the default is ./lancedb.
To use a different path from the default, run the following command:
emb ingest-sample -m 3-small -c catcafe --persist-path /path/to/my-vector-store
--corpus option¶
--corpus option specifies the sample corpus name to use. The available options are cat-names-en, cat-names-ja, dishes-en, dishes-ja, tourist-spots-en, tourist-spots-ja, movies-en, and movies-ja. The default is cat-names-en.
cat-names-en: 100 English cat names synthetically generated by an AI model. sourcecat-names-ja: Japanese translation ofcat-names-en. sourcedishes-en: 100 English dish names synthetically generated by an AI model. sourcedishes-ja: Japanese translation ofdishes-en. sourcetourist-spots-en: 100 English tourist spots synthetically generated by an AI model. sourcetourist-spots-ja: Japanese translation oftourinst-stapo-en. sourcemovies-en: 100 English movie titles synthetically generated by an AI model. sourcemovies-ja: Japanese translation ofmovies-en. source
To index the sample corpus dishes-en in a vector store collection named menu, run the following command:
emb ingest-sample -m 3-small -c menu --corpus dishes-en
--option option¶
To pass additional options to the model, use the --option/-o option. The options are model-specific, so please refer emb models command for available options for a specific model.
ingest command¶
Similar to emb ingest-sample, emb ingest command indexes vectors from a corpus into a vector store collection but allows you to use your own corpus file.
Note: if the model provides options for generating documents-optimized embeddings, those will be implicitly used for ingestion.
For example, for the Gemini embedding models, task_type=retrieval_document option will be added.
emb ingest --help
Usage: emb ingest [OPTIONS]
Ingest documents into the vector store. Ingestion-specific embeddings are
used if the model provides options for generating search documents-optimized
embeddings.
Options:
-e, --env-file TEXT Path to the .env file
-m, --model TEXT Model id or alias to use for embedding
[required]
-p, --model-path TEXT Path to the local model
--vector-store TEXT Vector store to use for storing embeddings
[default: lancedb]
--persist-path TEXT Path to persist the vector store
-c, --collection TEXT Collection name where to store the embeddings
[required]
-f, --file PATH File containing text to embed [required]
--input-format [csv] Input format of the file [default: csv]
--batch-size INTEGER Batch size for embedding [default: 100]
-o, --option <TEXT TEXT>... key/value options for the model
--help Show this message and exit.
--file option (required)¶
--file/-f option specifies the file containing text to embed. The file should be in CSV format with two columns. The first column is the document ID and the second column is the text to embed.
See provided sample corpus files for reference.
To index a text file named my-corpus.csv in a vector store collection named mycollection using text-embedding-3-small model, run the following command:
emb ingest -m 3-small -c mycollection -f my-corpus.csv
The output will show the name of vector store, collection, and the persistent path of the vector store.
Documents ingested successfully.
Vector store: lancedb (collection: mycollection)
Persist path: ./lancedb
--input-format option¶
--input-format option specifies the input format of the file. Currently, it only supports csv.
--batch-size option¶
--batch-size option specifies the batch size for embedding to reduce the number of API calls. The default is 100. To use a batch size of 50, run the following command:
emb ingest -m 3-small -c mycollection -f my-corpus.csv --batch-size 50
Other options¶
For other options common options to ingest-sample, refer to emb ingest-sample command.
search command¶
emb search command searches for documents in a collection for a query.
Note: if the model provides options for generating query-optimized embeddings, those will be implicitly used for searching.
For example, for the Gemini embedding models, task_type=retrieval_query option will be added.
emb search --help
Usage: emb search [OPTIONS]
Search for documents in the vector store for the query. Query-specific
embedding is used if the model provides options for generating search query-
optimized embeddings.
Options:
-e, --env-file TEXT Path to the .env file
-m, --model TEXT Model id or alias to use for embedding
[required]
-p, --model-path TEXT Path to the local model
--vector-store TEXT Vector store to use for storing embeddings
[default: lancedb]
--persist-path TEXT Path to persist the vector store
-c, --collection TEXT Collection name where the embeddings are stored
[required]
-q, --query TEXT Query text to search for
--image PATH Image file to search for
-k, --top-k INTEGER Number of top results to return [default: 5]
-o, --option <TEXT TEXT>... key/value options for the model
--help Show this message and exit.
--model option (required)¶
--model/-m option specifies the model to use for embedding the query text.
Important: Make sure you use the same model that was used to index the collection (unless you use a diffrent model with intention). Otherwise the results may not be accurate.
--model-path option¶
--model-path/-p option specifies the path to the local model. This is required for plugins that support local models, such as embcli-llamacpp.
--collection option (required)¶
--collection/-c option specifies the collection name where the embeddings are stored. The collection should already exist.
--query option¶
--query/-q option specifies the query text to search for. The query text will be embedded using the specified model.
Assuming you have menu collection indexed with text-embedding-3-small moddel. To search for documents in the menu collection, run the following command:
emb search -m 3-small -c menu -q "May I have some sweets?🍨"
The output will show the top results with their similarity scores:
Found 5 results:
Score: 0.44441199833903255, Document ID: 66, Text: Maple Taffy (Canada): Simple sweet treat hot maple syrup poured onto fresh snow. Cools hardens slightly rolled onto stick. Fun frozen treat.
Score: 0.43905687912956637, Document ID: 100, Text: Trdelník (Czech Republic): Spit cake rolled dough wrapped stick coated sugar nuts baked open fire. Sweet aromatic spiral pastry street stalls.
Score: 0.4234538039730697, Document ID: 35, Text: Baklava (Middle East/Balkans): Rich sweet pastry layers filo dough filled chopped nuts sweetened syrup honey. Flaky crunchy intensely sweet dessert.
Score: 0.42317725372218157, Document ID: 23, Text: Dim Sum (China): Variety of bite sized portions typically served in steamer baskets or small plates with tea. Includes steamed buns dumplings rice rolls savory sweet items. Ideal for social brunch.
Score: 0.4212782104868374, Document ID: 79, Text: Belgian Waffles (Belgium): Known for lightness crisp exterior. Brussels waffles larger rectangular Liege waffles denser caramelized pearl sugar. Served various sweet toppings.
--image option¶
--image option specifies the image file to search for. The image will be embedded using the specified model. This is useful for multimodal models that support image inputs. See Multimodal Usage for more details on multimodal models.
--top-k option¶
--top-k/-k option specifies the number of top results to return. The default is 5. To return top 10 results, run the following command:
emb search -m 3-small -c menu -q "May I have some sweets?🍨" -k 10
--vector-store option¶
--vector-store option specifies the vector store to use for storing embeddings.
--persist-path option¶
--persist-path option specifies the path to the vector store to be searched. The default is different for each vector store. For LanceDB, the default is ./lancedb.
To search in menu collection in a different path from the default, run the following command:
emb search -m 3-small -c menu -q "May I have some sweets?🍨" \
--persist-path /path/to/my-vector-store
--option option¶
To pass additional options to the model, use the --option/-o option. The options are model-specific, so please refer emb models command for available options for a specific model.
collections command¶
emb collections command shows the list of collection in a vector store.
emb collections --help
Usage: emb collections [OPTIONS]
List collections in the vector store.
Options:
-e, --env-file TEXT Path to the .env file
--vector-store TEXT Vector store to use for storing embeddings
[default: lancedb]
--persist-path TEXT Path to persist the vector store
--help Show this message and exit.
To list collectoins in the default vector store, run the following command:
emb collections
The output will show the list of collection names:
Collections:
- menu
- catcafe
--vector-store option¶
--vector-store option specifies the vector store to use for storing embeddings.
--persist-path option¶
--persist-path option specifies the path to the vector store. To list collections in a different path from the default, run the following command:
emb collections --persist-path /path/to/my-vector-store
delete-collection command¶
emb delete-collection command deletes a collection from a vector store.
emb delete-collection --help
Usage: emb delete-collection [OPTIONS]
Delete a collection from the vector store.
Options:
-e, --env-file TEXT Path to the .env file
--vector-store TEXT Vector store to use for storing embeddings
[default: lancedb]
--persist-path TEXT Path to persist the vector store
-c, --collection TEXT Collection name to delete [required]
--help Show this message and exit.
--collection option (required)¶
--collection/-c option specifies the collection name to delete. The collection should already exist.
To delete a collection named mycollection in a vector store, run the following command:
emb delete-collection -c mycollection
--vector-store option¶
--vector-store option specifies the vector store to use for storing embeddings.
--persist-path option¶
--persist-path option specifies the path to the vector store.