3. ⬆️ Uploading and processing your first dataset

This step assumes that you completed the previous two steps:

Signing up for a Tenyks account
Configuring your data source

Let's create a dataset! 🗂️

In Tenyks, a dataset is composed of:

🖼️ A) Images and annotations (in COCO format).
📊 B) Model predictions (in COCO format).

Note: In our case, we'll upload three sets of predictions, each one from a different model:

Hence, we'll have three different files (e.g., paligemma_predictions.json, etc.)

Important: You don't actually need annotations/predictions for your images—you can still use some of Tenyks' best features (e.g., object search 🔎, embedding viewer 🧠).

We set our AWS credentials based on the user we described in Section 2.1.1.

my_credentials = AWSCredentials(  
    aws_access_key_id="AKIAWZEMKLGJ6EXW6T63", # Replace with your access key id  
    aws_secret_access_key="ny6uBdMjzYA31l3fHBXeU3Spy5NygCPPe6OkCjPx", # Replace with your secret access key  
    region_name="us-east-2",  
)

We define the images and the annotations location

images_location = AWSLocation(  
    s3_uri="S3_URI_TO_IMAGES_DIRECTORY",  
    credentials=my_aws_credentials,  
)

annotations_location = AWSLocation(  
    s3_uri="S3_URI_TO_ANNOTATIONS_FILE",  
    credentials=my_aws_credentials,  
)

Let's create and ingest the dataset

# 1. Create a new dataset with the "images_location" we defined previously
dataset_from_cloud = tenyks.create_dataset(
    "NEW_DATASET_KEY", images_location=images_location)

# 2. Upload annotations
dataset_from_cloud.upload_annotations_from_cloud(annotations_location)

# 3. Ingest the dataset
dataset_from_cloud.ingest()

Note: If you wish to upload a dataset from your local computer, follow these steps instead.

dataset = tenyks.create_dataset("NEW_DATASET_KEY")
dataset.upload_images("/path/to/images_directory")
dataset.upload_annotations("/path/to/annotations_file")
dataset.ingest()

We can verify the status of our dataset with get_dataset()

As soon as the status is DONE, we can continue.

dataset = tenyks.get_dataset("circuit_board")  
dataset