Similarity Search
Similarity search is a technique used to find items that are similar or related to a given item or query.
The goal of similarity search is to (1) search for similar error cases on all images on a dataset, and (2) find similar images & objects of interests.
1. Vector embeddings
Vector embeddings represent data points as high-dimensional vectors. Similarity search with vector embeddings involves finding the closest vectors to a query vector in this high-dimensional space, using a distance metric like cosine similarity.
If you're curious, this blog post provide a more detailed explanation for embedding search.
Tenyks provides the embeddings!
After you upload a dataset on the Tenyks platform, we automagically 🪄 generate embeddings for your data, so that you can do similarity search using vector embeddings right away!
2. Similarity search on Tenyks
At its core, similarity search is a way to discover relevant information based on likeness rather than exact matches.
On the Tenyks platform you can use three kinds of search modalities:
Modality | Definition |
---|---|
Text Search | Finding relevant textual information, metadata, annotations or descriptions associated with images/objects. The query is run over all the image embeddings in your dataset. |
Image Search | Identifying visually similar entire images to a query image. The search is executed over all the image embeddings too. |
Object Search | Finding specific object instances that look alike to a given query object (i.e., bounding box). The query is run over all annotation and prediction embeddings. |
If you're wondering when to use each search modality, here's a tip:
Modality | Use Case |
---|---|
Text Search | Understand dataset contents/labels, provide context for a query |
Image Search | Group similar scenes, compositions or overall image contents |
Object Search | Analyze specific object categories or fine-grained visual similarity between object instances |
Each modality allows you to retrieve the 200 most relevant results per query.
3. Text Search
To do text search, on the Data Explorer, type a query in natural language (i.e., regular spoken language) in the search bar (Figure 1).
In text search Tenyks will run a query over all the image embeddings returning the top 200 results.
For instance, the previous video shows a dataset that contains a class vehicle
to group cars, trucks, etc. However, there is no specific label for pickup trucks
🛻. As shown in Figure 1, our text query for pickup truck returns images containing pick up trucks! 🔥
🤔 What about creating a tag name
for trucks
with different tag values
such as pickup
, delivery
, mixer
? See the Tagging section for more information.
4. Image Search
For searching similar images (see Figure 2):
- Go to the Data Explorer
- Click on the "Select" button (you select up to 10 images)
- Click on "Find Similar"
5. Object Search
To do similarity search at the object level (see Figure 3):
- Go to the Data Explorer
- Click on the "Select" button (you can select up to 10 objects)
- Click on "Find Similar"
Updated 6 months ago
Learn how to tagging allows you to group similar items to speed up dataset analysis.