2. 📂 Configuring your data source

2.1 Basics

You can upload datasets from two primary sources:

From cloud storage: Amazon S3, Azure or Google Cloud Storage
From your local computer

You can replicate this full document by downloading this dataset from here:

https://drive.google.com/file/d/1Otn6XYrqw7RdriUWQC-0aKnXHxFuoVK1/view?usp=sharing

Let's use the following code to download the dataset. 💻

!pip install gdown -q

import os
import gdown
import zipfile

def download_and_unzip_google_drive_file(drive_url, output_dir='/content'):
    # Define the path where the zip file will be temporarily stored
    zip_file_path = os.path.join(output_dir, 'file.zip')

    # Download the file from the Google Drive link
    gdown.download(drive_url, zip_file_path, quiet=False)
    
    # Unzip the downloaded file
    with zipfile.ZipFile(zip_file_path, 'r') as zip_ref:
        zip_ref.extractall(output_dir)
    
    # Remove the ZIP file after extraction
    os.remove(zip_file_path)
    
    print(f'Files extracted to {output_dir}')

download_and_unzip_google_drive_file("1Otn6XYrqw7RdriUWQC-0aKnXHxFuoVK1", output_dir="/content")

2.2 Amazon S3

For brevity we'll briefly mention how to set up Amazon S3 to upload a dataset.

You can find detailed instructions to set up Amazon S3, Azure and Google Cloud Storage.

2.2.1 Setting up and configuring your S3 bucket

To connect your data to Tenyks, you'll set up an S3 bucket following this guide. 📁

At a high level, you'll be creating a bucket and a user. 👤

The bucket will contain a few folders from which Tenyks (via your user credentials) will have access to the dataset:

{your_tenyks_data_bucket}/{your_dataset_name}/{images_directory_name}/image_n.png
{your_tenyks_data_bucket}/{your_dataset_name}/predictions_model_A.json
{your_tenyks_data_bucket}/{your_dataset_name}/predictions_model_B.json
{your_tenyks_data_bucket}/{your_dataset_name}/predictions_model_C.json
{your_tenyks_data_bucket}/{your_dataset_name}/annotations.json

The user is expected to have a aws_access_key_id and a aws_secret_access_key, which will be used to retrieve the video.