Every important product detail -- captured in a single embedding

Vody's multimodal embeddings use image and text to represent your products, enabling sophisticated similarity search, de-duplication, and more

Model descriptions

Our model incorporates breakthrough advancements in open-domain object identification using text prompts. This dramatically improves upon traditional methods that depend on predetermined output labels.

Multimodal product embedding

Ingests el ingests multimodal data, such as product name, image, description, and generates a 768-dimensional vector that can be applied to datasets of unlimited size.

Named-entity recognition

Outputs the correct label name for a product, based on the image, description, and other associated text.

Material & finish classification

Uses multimodal understanding to classify materials such as brick, tile, ceramic, and wood.

Lifestage classification

Intelligently groups products according to whether they’re suited for infants, teens, or adults, among other demographic categories.

Room classification

Uses multimodal understanding to classify products according to which room (such as kitchen, living room) they are typically used in.

Model descriptions

Our model incorporates breakthrough advancements in open-domain object identification using text prompts. This dramatically improves upon traditional methods that depend on predetermined output labels.

Multimodal product embedding
COMING SOON

Ingests el ingests multimodal data, such as product name, image, description, and generates a 768-dimensional vector that can be applied to datasets of unlimited size.

Model descriptions

Our model incorporates breakthrough advancements in open-domain object identification using text prompts. This dramatically improves upon traditional methods that depend on predetermined output labels.

Multimodal product embedding

Ingests el ingests multimodal data, such as product name, image, description, and generates a 768-dimensional vector that can be applied to datasets of unlimited size.

COMING SOON
MMLLM fine-tuned for e-commerce

Fast, efficient language generation model tailored for e-commerce based on input text and image.

COMING SOON

Use Cases

Enhanced ad targeting

Enhanced recommendations

Enhanced personalization

Product Compatibility

Enhanced product search

Product Similarity

Product tagging

Color trend/ preference analysis

Enhanced ad targeting

Enhanced recommendations

Enhanced personalization

Product Compatibility

Enhanced product search

Enhanced ad targeting

Enhanced recommendations

Enhanced personalization

Product Compatibility

Enhanced product search

Product Similarity

Inventory Deduplication

Advantages
Quick onboarding
Minimal setup and maintenance
Easy scalability
Automatic load balancing
Advantages
Quick onboarding
Minimal setup and maintenance
Easy scalability
Automatic load balancing
Advantages
Quick onboarding
Minimal setup and maintenance
Easy scalability
Automatic load balancing
Advantages
Quick onboarding
Ability to customize deployment, configuration, and scaling according to client needs
Compatibility with existing infrastructure
Ability to operate in air-gapped environments
Advantages
Quick onboarding
Ability to customize deployment, configuration, and scaling according to client needs
Compatibility with existing infrastructure
Ability to operate in air-gapped environments
Advantages
Quick onboarding
Ability to customize deployment, configuration, and scaling according to client needs
Compatibility with existing infrastructure
Ability to operate in air-gapped environments
Advantages
Balances the benefits of both cloud and on-premises deployments
Added flexibility for customers with varying use cases
Easy transition from one deployment model to another as requirements evolve.
Advantages
Balances the benefits of both cloud and on-premises deployments
Added flexibility for customers with varying use cases
Easy transition from one deployment model to another as requirements evolve.
Advantages
Balances the benefits of both cloud and on-premises deployments
Added flexibility for customers with varying use cases
Easy transition from one deployment model to another as requirements evolve.

Model Features

Recommending related products

Vody's model combines visual and textual inputs to build a holistic understanding of each product. This allows the model to incorporate complex concepts like color perception to accurately classify the colors of objects.

Object detection

Our model incorporates breakthrough advancements in open-domain object identification using text prompts. This dramatically improves upon traditional methods that depend on predetermined output labels. For example, the model can ignore the background color of an image and instead focus on the color of the relevant object, such as a table or chair.

Image augmentation

We created synthetic training data to reflect various lighting conditions. This enhances the model's ability to recognize color changes caused by different environments.

Multimodal understanding

Vody's model combines images, text, structured data, and user click patterns (“clickstream” data) to build a holistic understanding of each product (with our embedding model being the first ever to incorporate structured data and clickstream). Given that 60% of shoppers are frustrated by irrelevant results and 80% of shoppers say that accuracy and convenience are the most important aspects of online shopping, the improved model understanding of products will substantially improve the quality of users’s search experience and drive both customer acquisition and retention.

Flexibility and customization

Our embedding models enable textual and visual search (both separately and in tandem with each other) and content-based recommendation out of the box. In addition, our embedding models can be fine-tuned on a variety of different e-commerce tasks, including product compositionality, cart filling, and product categorization, to fit whatever your company needs.

Domain-specific understanding

By training our model specifically on e-commerce related-data, we are able to dramatically improve our models’ understanding of e-commerce-related items across a variety of different product domains (furniture, pets, clothing, groceries, etc.).

Robust performance

Augmenting training data to reflect various lighting conditions, rotations, translations, etc. makes the model resistant to small perturbations in the input product image, meaning you can tailor our embeddings to your products rather than your products to the embeddings.

Multimodal understanding

Vody's generative models combine images and text as part of a multimodal LLM, offering ~20% better performance than traditional LLMs at ~200x smaller size.

Fast implementation and flexibility

Similar to the emergent capabilities found in traditional LLMs, our e-commerce specific multimodal LLMs are able to do several e-commerce tasks out-of-the-box, including product description generation, product titling, product Q&A via chat, and much more. We greatly improve the zero-shot capabilities of unseen tasks by incorporating instruction tuning on our world-class proprietary e-commerce datasets.

Domain-specific understanding

By training our model specifically on e-commerce related-data, we are able to dramatically improve our models’ understanding of e-commerce-related items across a variety of different product domains (furniture, pets, clothing, groceries, etc.) as well as their ability to generate text more specific to e-commerce applications. For example, our models are far better at writing about products in a way that sells well to interested shoppers.

No prompt engineering required

Unlike others in the LLM space, Vody provides a list of best-practice prompts in order to optimize performance across a variety of different generative tasks. Our research team works tirelessly to discover the best prompts so that you don’t have to.

Experimentation and Evaluation

1. Collect a labeled dataset for the task you want to test the model on. For our color model, this may look like a dataset of images and product titles mapped to the color of the requested item in the images.Use an automatic metric to compare the model’s predictions to the correct answers.

2. Accuracy will work well if your labels are relatively balanced (i.e., if you are checking for 10 patterns, each pattern should comprise close to 10% of the overall dataset). If they aren’t, we recommend using F1 instead.

Experimentation and Evaluation

Automatic. Can be further grouped into similarity-based evaluation metrics (L2, cosine) and top-k evaluation metrics (compare the k most similar predictions to a label of the top k predictions from a dataset and see how many match).

Human Evaluation. Take the approach from top-k evaluation metrics but have humans evaluate it in ways more relevant to your business. For example, if you’re using embeddings for search, you can return the top 10 results for a search and measure the number of clicks on purposes resulting from that list of 10.

Experimentation and Evaluation

Manual Evaluation. Have a human provide ratings (e.g., 1-5 stars) for each generation from the model.

Reward Model. Based on a labeled dataset of human ratings for a model’s generations, train a regression model to predict what a human rating for a generation would be. Use this model to label generative outputs at scale.

A team from:

Contact us

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.