Home

2. 📂 Configuring your data source

2.1 Basics

You can upload datasets from two primary sources:

  1. From cloud storage: Amazon S3, Azure or Google Cloud Storage
  2. From your local computer

You can replicate this full document by downloading this dataset from here:

Let's use the following code to download the dataset. 💻

!pip install gdown -q
import os
import gdown
import zipfile

def download_and_unzip_google_drive_file(drive_url, output_dir='/content'):
    # Define the path where the zip file will be temporarily stored
    zip_file_path = os.path.join(output_dir, 'file.zip')

    # Download the file from the Google Drive link
    gdown.download(drive_url, zip_file_path, quiet=False)
    
    # Unzip the downloaded file
    with zipfile.ZipFile(zip_file_path, 'r') as zip_ref:
        zip_ref.extractall(output_dir)
    
    # Remove the ZIP file after extraction
    os.remove(zip_file_path)
    
    print(f'Files extracted to {output_dir}')
download_and_unzip_google_drive_file("1Otn6XYrqw7RdriUWQC-0aKnXHxFuoVK1", output_dir="/content")

2.2 Amazon S3

For brevity we'll briefly mention how to set up Amazon S3 to upload a dataset.

2.2.1 Setting up and configuring your S3 bucket

To connect your data to Tenyks, you'll set up an S3 bucket following this guide. 📁

At a high level, you'll be creating a bucket and a user. 👤

The bucket will contain a few folders from which Tenyks (via your user credentials) will have access to the dataset:

  • {your_tenyks_data_bucket}/{your_dataset_name}/{images_directory_name}/image_n.png
  • {your_tenyks_data_bucket}/{your_dataset_name}/predictions_model_A.json
  • {your_tenyks_data_bucket}/{your_dataset_name}/predictions_model_B.json
  • {your_tenyks_data_bucket}/{your_dataset_name}/predictions_model_C.json
  • {your_tenyks_data_bucket}/{your_dataset_name}/annotations.json

The user is expected to have a aws_access_key_id and a aws_secret_access_key, which will be used to retrieve the video.