Home

3. โฌ†๏ธ Uploading and processing your first dataset

This step assumes that you completed the previous two steps:

  1. Signing up for a Tenyks account
  2. Configuring your data source

Let's create a dataset! ๐Ÿ—‚๏ธ

In Tenyks, a dataset is composed of:

  • ๐Ÿ–ผ๏ธ A) Images and annotations (in COCO format).
  • ๐Ÿ“Š B) Model predictions (in COCO format).

Note: In our case, we'll upload three sets of predictions, each one from a different model:

Hence, we'll have three different files (e.g., paligemma_predictions.json, etc.)

Important: You don't actually need annotations/predictions for your imagesโ€”you can still use some of Tenyks' best features (e.g., object search ๐Ÿ”Ž, embedding viewer ๐Ÿง ).

We set our AWS credentials based on the user we described in Section 2.1.1.

my_credentials = AWSCredentials(  
    aws_access_key_id="AKIAWZEMKLGJ6EXW6T63", # Replace with your access key id  
    aws_secret_access_key="ny6uBdMjzYA31l3fHBXeU3Spy5NygCPPe6OkCjPx", # Replace with your secret access key  
    region_name="us-east-2",  
)

We define the images and the annotations location

images_location = AWSLocation(  
    s3_uri="S3_URI_TO_IMAGES_DIRECTORY",  
    credentials=my_aws_credentials,  
)

annotations_location = AWSLocation(  
    s3_uri="S3_URI_TO_ANNOTATIONS_FILE",  
    credentials=my_aws_credentials,  
)

Let's create and ingest the dataset

# 1. Create a new dataset with the "images_location" we defined previously
dataset_from_cloud = tenyks.create_dataset(
    "NEW_DATASET_KEY", images_location=images_location)

# 2. Upload annotations
dataset_from_cloud.upload_annotations_from_cloud(annotations_location)

# 3. Ingest the dataset
dataset_from_cloud.ingest()

Note: If you wish to upload a dataset from your local computer, follow these steps instead.

dataset = tenyks.create_dataset("NEW_DATASET_KEY")
dataset.upload_images("/path/to/images_directory")
dataset.upload_annotations("/path/to/annotations_file")
dataset.ingest()

We can verify the status of our dataset with get_dataset()

  • As soon as the status is DONE, we can continue.
dataset = tenyks.get_dataset("circuit_board")  
dataset