2. 📂 Configuring your data source
2.1 Basics
You can upload datasets from two primary sources:
- From cloud storage: Amazon S3, Azure or Google Cloud Storage
- From your local computer
You can replicate this full document by downloading this dataset from here:
Let's use the following code to download the dataset. 💻
!pip install gdown -q
import os
import gdown
import zipfile
def download_and_unzip_google_drive_file(drive_url, output_dir='/content'):
# Define the path where the zip file will be temporarily stored
zip_file_path = os.path.join(output_dir, 'file.zip')
# Download the file from the Google Drive link
gdown.download(drive_url, zip_file_path, quiet=False)
# Unzip the downloaded file
with zipfile.ZipFile(zip_file_path, 'r') as zip_ref:
zip_ref.extractall(output_dir)
# Remove the ZIP file after extraction
os.remove(zip_file_path)
print(f'Files extracted to {output_dir}')
download_and_unzip_google_drive_file("1Otn6XYrqw7RdriUWQC-0aKnXHxFuoVK1", output_dir="/content")
2.2 Amazon S3
For brevity we'll briefly mention how to set up Amazon S3 to upload a dataset.
- You can find detailed instructions to set up Amazon S3, Azure and Google Cloud Storage.
2.2.1 Setting up and configuring your S3 bucket
To connect your data to Tenyks, you'll set up an S3 bucket following this guide. 📁
At a high level, you'll be creating a bucket
and a user
. 👤
The bucket will contain a few folders from which Tenyks (via your user
credentials) will have access to the dataset:
{your_tenyks_data_bucket}/{your_dataset_name}/{images_directory_name}/image_n.png
{your_tenyks_data_bucket}/{your_dataset_name}/predictions_model_A.json
{your_tenyks_data_bucket}/{your_dataset_name}/predictions_model_B.json
{your_tenyks_data_bucket}/{your_dataset_name}/predictions_model_C.json
{your_tenyks_data_bucket}/{your_dataset_name}/annotations.json
The user
is expected to have a aws_access_key_id
and a aws_secret_access_key
, which will be used to retrieve the video.
Updated 3 months ago