3. ⬆️ Uploading and processing your first video

This step assumes that you completed the previous two steps:

Signing up for a Tenyks account
Configuring your video source

Let's verify that we are in the correct workspace

workspace_key = "gabriel_data_workspace_976264f4"
tenyks.set_workspace(workspace_key)

""" Output
2024-09-26 17:08:27,748 - Tenyks - INFO - Workspace set to 'gabriel_data_workspace_976264f4'.
INFO:Tenyks:Workspace set to 'gabriel_data_workspace_976264f4'.
"""

We create a dataset:

In Tenyks, a dataset is composed of images and annotations.
For videos, we don't require annotations: the system will actually extract images (i.e., frames) from the video itself duing the uploading phase.

dataset = tenyks.create_dataset(
    "paris_train_station"
)

We set our AWS credentials based on the user we described in Section 2.1.1.

aws_access_key_id = "***************"
aws_secret_access_key = "****************************"
region_name = "us-east-2"

my_credentials = AWSCredentials(
    aws_access_key_id=aws_access_key_id,
    aws_secret_access_key=aws_secret_access_key,
    region_name=region_name
)

In s3_uri, make sure you only define the address up to the folder that contain the video.

For instance, for a video on s3://mytenyksbucket/datasets/surveillance.mp4, we only require: s3://mytenyksbucket/datasets/

aws_video_location = AWSLocation(  
    type="aws_s3",  
    s3_uri="s3://mytenyksbucket/paris_train_station/",  
    credentials=my_credentials,  
)

Now, let's ingest our video:

subsampling frequency: How many frames per second (fps) are sampled or processed from the video.
frames to subsample: This specifies the total number of frames after the initial sampling.
['person']: This list contains the categories or labels that you want the object detection algorithm to look for within the video frames.
confidence threshold: Numerical value that sets the minimum confidence level for considering a detected object as valid. A confidence threshold of 0.005 means that if the object detection algorithm predicts an object (e.g., a person) with a confidence score lower than 0.005, that detection will be ignored.

Note that the product ofsubsampling frequency * frames to subsample should be EQUAL or LESS than the total number of seconds of your video.

dataset.upload_videos_from_cloud_and_ingest(
    aws_video_location,   # aws credentials and s3_uri
    1,                    # subsampling frequency
    100,                  # frames to subsample
    ['person', 'train'],  # prompts for object detection
    0.005                 # object detection confidence threshold
)

We can verify the status of our dataset with get_dataset()

As soon as the status is DONE, we can continue.

dataset = tenyks.get_dataset("paris_train_station")  
dataset

""" Output
Dataset(client=<tenyks_sdk.sdk.client.Client object at 0x79a6e7ec4490>, workspace_name='gabriel_data_workspace_976264f4', key='paris_train_station', name='paris_train_station', owner='789f7e4c-2a89-45ef-86f2-abf7633332b1', owner_email='[email protected]', created_at=datetime.datetime(2024, 9, 26, 17, 21, 44), images_location=AWSLocation(type='aws_s3', s3_uri='s3://tenyks-prod-storage/gabriel_data_workspace_976264f4/paris_train_station/images/', credentials=None), metadata_location=AWSLocation(type='aws_s3', s3_uri='s3://tenyks-prod-storage/gabriel_data_workspace_976264f4/paris_train_station/metadata/', credentials=None), categories=[Category(name='suggested_annotation', color='#1F77B4', id=0)], models=[], status='DONE', n_images=100, iou_threshold=0.5)
"""