Home

Similarity Search

Similarity search is a technique used to find items that are similar or related to a given item or query.

The goal of similarity search is to (1) search for similar error cases on all images on a dataset, and (2) find similar images & objects of interests.

1. Vector embeddings

Vector embeddings represent data points as high-dimensional vectors. Similarity search with vector embeddings involves finding the closest vectors to a query vector in this high-dimensional space, using a distance metric like cosine similarity.

If you're curious, this blog post provide a more detailed explanation for embedding search.

πŸ˜ƒ

Tenyks provides the embeddings!

After you upload a dataset on the Tenyks platform, we automagically πŸͺ„ generate embeddings for your data, so that you can do similarity search using vector embeddings right away!

2. Similarity search on Tenyks

At its core, similarity search is a way to discover relevant information based on likeness rather than exact matches.

On the Tenyks platform you can use three kinds of search modalities:

ModalityDefinition
Text SearchFinding relevant textual information, metadata, annotations or descriptions associated with images/objects. The query is run over all the image embeddings in your dataset.
Image SearchIdentifying visually similar entire images to a query image. The search is executed over all the image embeddings too.
Object SearchFinding specific object instances that look alike to a given query object (i.e., bounding box). The query is run over all annotation and prediction embeddings.

If you're wondering when to use each search modality, here's a tip:

ModalityUse Case
Text SearchUnderstand dataset contents/labels, provide context for a query
Image SearchGroup similar scenes, compositions or overall image contents
Object SearchAnalyze specific object categories or fine-grained visual similarity between object instances

Each modality allows you to retrieve the 200 most relevant results per query.

3. Text Search

To do text search, on the Data Explorer, type a query in natural language (i.e., regular spoken language) in the search bar (Figure 1).

In text search Tenyks will run a query over all the image embeddings returning the top 200 results.

Video showing text search on the tenyks platform

Figure 1. Text search for the query "pickup truck"

For instance, the previous video shows a dataset that contains a class vehicle to group cars, trucks, etc. However, there is no specific label for pickup trucks πŸ›». As shown in Figure 1, our text query for pickup truck returns images containing pick up trucks! πŸ”₯

πŸ€” What about creating a tag name for trucks with different tag values such as pickup, delivery, mixer? See the Tagging section for more information.

4. Image Search

For searching similar images (see Figure 2):

  1. Go to the Data Explorer
  2. Click on the "Select" button (you select up to 10 images)
  3. Click on "Find Similar"
Figure 2. Selecting images with trucks as the query, returns similar images with trucks

Figure 2. Selecting images with trucks as the query, returns similar images with trucks

5. Object Search

To do similarity search at the object level (see Figure 3):

  1. Go to the Data Explorer
  2. Click on the "Select" button (you can select up to 10 objects)
  3. Click on "Find Similar"
Figure 3. Selecting objects of the class "Buses", in particular, School Buses, returns similar objects

Figure 3. Selecting objects of the class "Buses", in particular, School Buses, returns similar objects


What’s Next

Learn how to tagging allows you to group similar items to speed up dataset analysis.