Model picture
Our model incorporates breakthrough advancements in open-domain object identification using text prompts. This dramatically improves upon traditional methods that depend on predetermined output labels.
Our models leverages both product photos, a text prompt and more. That means they can spot a particular object (like a couch or a shirt) in a crowded photo and provide detailed information from the image.
Our model incorporates breakthrough advancements in open-domain object identification using text prompts. This dramatically improves upon traditional methods that depend on predetermined output labels.
Classifies 329 colors and roughly 50 different patterns.
Outputs the correct label name for a product, based on the image, description, and other associated text.
Uses multimodal understanding to classify materials such as brick, tile, ceramic, and wood.
Intelligently groups products according to whether they’re suited for infants, teens, or adults, among other demographic categories.
Uses multimodal understanding to classify products according to which room (such as kitchen, living room) they are typically used in.
Our model incorporates breakthrough advancements in open-domain object identification using text prompts. This dramatically improves upon traditional methods that depend on predetermined output labels.
Classifies 329 colors and roughly 50 different patterns.
Fast, efficient language generation model tailored for e-commerce based on input text and image.
Vody's model combines visual and textual inputs to build a holistic understanding of each product. This allows the model to incorporate complex concepts like color perception to accurately classify the colors of objects.
Our model incorporates breakthrough advancements in open-domain object identification using text prompts. This dramatically improves upon traditional methods that depend on predetermined output labels. For example, the model can ignore the background color of an image and instead focus on the color of the relevant object, such as a table or chair.
We created synthetic training data to reflect various lighting conditions. This enhances the model's ability to recognize color changes caused by different environments.
Vody's generative models combine images and text as part of a multimodal LLM, offering ~20% better performance than traditional LLMs at ~200x smaller size.
Similar to the emergent capabilities found in traditional LLMs, our e-commerce specific multimodal LLMs are able to do several e-commerce tasks out-of-the-box, including product description generation, product titling, product Q&A via chat, and much more. We greatly improve the zero-shot capabilities of unseen tasks by incorporating instruction tuning on our world-class proprietary e-commerce datasets.
By training our model specifically on e-commerce related-data, we are able to dramatically improve our models’ understanding of e-commerce-related items across a variety of different product domains (furniture, pets, clothing, groceries, etc.) as well as their ability to generate text more specific to e-commerce applications. For example, our models are far better at writing about products in a way that sells well to interested shoppers.
Unlike others in the LLM space, Vody provides a list of best-practice prompts in order to optimize performance across a variety of different generative tasks. Our research team works tirelessly to discover the best prompts so that you don’t have to.
1. Collect a labeled dataset for the task you want to test the model on. For our color model, this may look like a dataset of images and product titles mapped to the color of the requested item in the images.Use an automatic metric to compare the model’s predictions to the correct answers.
2. Accuracy will work well if your labels are relatively balanced (i.e., if you are checking for 10 patterns, each pattern should comprise close to 10% of the overall dataset). If they aren’t, we recommend using F1 instead.
Manual Evaluation. Have a human provide ratings (e.g., 1-5 stars) for each generation from the model.
Reward Model. Based on a labeled dataset of human ratings for a model’s generations, train a regression model to predict what a human rating for a generation would be. Use this model to label generative outputs at scale.