Basic Usage¶
We assume you have installed the embcli-openai plugin and have an OpenAI API key to go through this tutorial.
pip install embcli-openai
cat .env
OPENAI_API_KEY=<YOUR_OPENAI_KEY>
models
command¶
emb models
command lists all available models in the current environment.
emb models
The output will show the available models in the current environment, including model names, aliases, and supported model options.
# Assuming you have installed `embcli-openai` and `embcli-gemini` plugins
OpenAIEmbeddingModel
Vendor: openai
Models:
* text-embedding-3-small (aliases: 3-small)
* text-embedding-3-large (aliases: 3-large)
* text-embedding-ada-002 (aliases: ada-002)
Model Options:
* dimensions (int) - The number of dimensions the resulting output embeddings should have. Only supported in text-embedding-3 and later models.
GeminiEmbeddingModel
Vendor: gemini
Models:
* gemini-embedding-exp-03-07 (aliases: exp-03-07)
* text-embedding-004 (aliases: text-004)
* embedding-001 (aliases: )
Model Options:
* task_type (str) - The type of task for the embedding. Supported task types: 'semantic_similarity', 'classification', 'clustering', 'retrieval_document', 'retrieval_query', 'question_answering', 'fact_verification', 'code_retrieval_query'
embed
command¶
emb embed
command generates embeddings for the provided text of file using the specified model.
emb embed --help
Usage: emb embed [OPTIONS] [TEXT]
Generate embeddings for the provided text or file content.
Options:
-e, --env-file TEXT Path to the .env file
-m, --model TEXT Model id or alias to use for embedding
[required]
-p, --model-path TEXT Path to the local model
-f, --file PATH File containing text to embed
--image PATH Image file to embed
-o, --option <TEXT TEXT>... key/value options for the model
--help Show this message and exit.
--model
option (required)¶
--model
/-m
option specifies the model to use for embedding.
To generate an embedding for an input text by text-embedding-3-small
model, run this command:
emb embed -m text-embedding-3-small "Have you taken a coffee break?☕"
The output will be a JSON array of floats or ints representing the embedding vector for the input text.
Note: The output will be an int array if the model supports quantized output, such as int8
or binary
. Otherwise, it will be a float array.
[-0.07306485623121262, -0.02141696587204933, -0.021973779425024986, -0.030774157494306564, -0.028927164152264595, -0.020126787945628166, 0.031263068318367004, 0.03911278396844864, -0.025681346654891968, 0.005500235594809055, 0.033544644713401794, -0.011625189334154129, 0.007747862488031387, -0.009350400418043137, ...]
You can also use an alias for the model. For example, text-embedding-3-small
has an alias 3-small
:
emb embed -m 3-small "Have you taken a coffee break?☕"
--model-path
option¶
--model-path
/-p
option specifies the path to a local model. This is required for plugins that support local models, such as embcli-llamacpp
.
To get an embedding for an input text by running the GGUF converted model:
emb embed -m llamacpp -p ./all-MiniLM-L6-v2.F16.gguf \
"Owls can rotate their necks 270 degrees without injury🦉"
--file
option¶
To generate an embedding for a text in a file, use the --file
/-f
option:
# Assuming you have a file named `coffee.txt` in the current directory
cat coffee.txt
How to make a cup of coffee ☕
To boil water, pour it into a kettle or saucepan and heat it on the stove or with an electric kettle. Wait until you see large, steady bubbles rising and breaking on the surface. Once it reaches a rolling boil, the water is ready to use.
emb embed -m 3-small -f coffee.txt
--image
option¶
To generate an embedding for an image, use the --image
option. The image file should be in a format supported by the model (e.g., JPEG, PNG). See Multimodal Usage for more details on multimodal models.
--option
option¶
To pass additional options to the model, use the --option
/-o
option. The options are model-specific, so please refer emb models
command for available options for a specific model.
OpenAIEmbeddingModel
supports dimensions
option to specify the number of output dimensions:
emb embed -m 3-small -o dimensions 512 "Have you taken a coffee break?☕"
simscore
command¶
emb simscore
command calculates the similarity score or distance between two text embeddings.
emb simscore --help
Usage: emb simscore [OPTIONS] [TEXT1] [TEXT2]
Calculate similarity score between two inputs.
Options:
-e, --env-file TEXT Path to the .env file
-m, --model TEXT Model id or alias to use for embedding
[required]
-p, --model-path TEXT Path to the local model
-s, --similarity [dot|cosine|euclidean|manhattan]
Similarity function to use [default:
cosine]
-f1, --file1 PATH First file containing text to compare
-f2, --file2 PATH Second file containing text to compare
--image1 PATH First image file to compare
--image2 PATH Second image file to compare
-o, --option <TEXT TEXT>... key/value options for the model
--help Show this message and exit.
--model
option (required)¶
--model
/-m
option specifies the model to use for embedding.
To calculate the similarity score between two input texts using text-embedding-3-small
model, run this command:
emb simscore -m 3-small "I have a cat" "私は猫を飼っています"
The output will be a float value calculated by the specified metric (default is cosine similarity):
0.5505237095494223
--model-path
option¶
--model-path
/-p
option specifies the path to a local model. This is required for plugins that support local models, such as embcli-llamacpp
.
--similarity
option¶
--similarity
/-s
option specifies the similarity function to use. The available options are dot
, cosine
, euclidean
, and manhattan
. If two vectors are normalized, dot
and cosine
yields the same value. The default is cosine
.
To calculate the euclidean distance between two input texts, run this command:
emb simscore -m 3-small -s euclidean "I have a cat" "私は猫を飼っています"
--file1
and --file2
options¶
To calculate the similarity score between two embeddings in files, use the --file1
/-f1
and --file2
/-f2
options:
# Assuming you have two files named coffee.txt and caffe.txt in the current directory
cat coffee.txt
How to make a cup of coffee ☕
To boil water, pour it into a kettle or saucepan and heat it on the stove or with an electric kettle. Wait until you see large, steady bubbles rising and breaking on the surface. Once it reaches a rolling boil, the water is ready to use.
cat caffe.txt
Come preparare una tazza di caffè ☕
Per far bollire l'acqua, versala in un bollitore o in un pentolino e riscaldala sul fornello o con un bollitore elettrico. Aspetta finché non vedi grandi bolle stabili salire e rompersi sulla superficie. Una volta che l'acqua raggiunge un'ebollizione vigorosa, è pronta per essere usata.
emb simscore -m 3-small -f1 coffee.txt -f2 caffe.txt
--image1
and --image2
options¶
To calculate the similarity score between two images or between an image and a text, use the --image1
and --image2
options. The image files should be in a format supported by the model (e.g., JPEG, PNG). See Multimodal Usage for more details on multimodal models.
--option
option¶
To pass additional options to the model, use the --option
/-o
option. The options are model-specific, so please refer emb models
command for available options for a specific model.
Common options¶
Options that are common to all commands.
--env
option¶
emb
implicitly loads the environment variables from the .env
file in the current directory. You can also specify a different .env
file using the --env
/-e
option:
cat /path/to/your/.env
OPENAI_API_KEY=<YOUR_OPENAI_KEY>
emb embed -m 3-small -e /path/to/your/.env "Have you taken a coffee break?☕"