Our Mission
The switch to remote work, lesser physical activities and social isolation have created new challenges. With the increased level of stress and loss of jobs worldwide, the development of affordable, safe, and efficient ways to maintain physical and mental health is needed now.
Yoga is a great way to achieve this.However, since the beginning of the pandemic, fitness centers and yoga studios around the world have been either fully or partially unavailable due to distancing restrictions. The practitioner’s ability to receive guidance from professional instructors has significantly reduced. Navigating the ocean of virtual sessions published by fitness centers and independent instructors is challenging, especially for beginners and people with injuries or other health issues.
We have a goal to make the growing amount of online video content for at-home yoga practice personalized, accessible and searchable. Our work will empower instructors, their students, and virtual fitness platforms to build personalized and balanced program of exercises with AI-powered tools for summarization of videos. We also strive to provide video recommendations that best match the user’s needs.
Billion $ Yoga Industry Worldwide
Million Online Videos of Yoga
Million American Yoga Practitioners
Yoga Teachers Created Videos
The How?
“Stay fit, stay safe” is an innovative deep learning AI solution deployed on Azure cloud that elevates virtual yoga practice and makes it more accessible by making the video content searchable, providing the analysis of health benefits of the session and curating personalized recommendations, while reducing costs.
To enable this experience, we developed a platform for multi-modal yoga video summarization powered by computer vision AI. It incorporates a user-friendly UI, a content analysis pipeline running behind the scene, and a recommendation engine that connects the two pieces together. We invested significant effort in providing privacy guarantees.
Architecture
We built a platform that enables discovery of yoga videos at scale. We use the Azure Cloud infrastructure to host the asynchronous pipeline. This backend is used to run summarization for videos. The pipeline is built for high scalability and modularity. We host our inference models as endpoints. The pipeline consists of multiple stages:
Data
The classifier was trained and tested on over 14k images collected from open source datasets and augmented with frames of poses contributed by project participants. We invested significant effort in cleaning the data and correcting the labels. We also augmented particularly challenges classes with curated data.
Yoga instructors provided us access to over 100 hours of video lessons. We collaborated with domain experts to capture health benefits and contraindications for various yoga poses.
Model
- Stage 1: Human Detection
- Stage 2: Human Joints Detection
- Stage 3: Yoga Pose Classifier
- Stage 4: Temporal correction
We detect the human in each frame of the video using a Faster RCNN object detector.
The frames with human detections are then sent to a human pose estimation model. We use the Deep High-Resolution models (link) to estimate skeleton joints.
The extracted joint location dataset is further augmented by adding engineered features. These features capture the relative positions between joints. Finally, we classify this tabular data as one of 71 different yoga poses using a hyper-parameter optimized gradient boosted model (LightGBM).
We then combine the yoga pose classifications across the image frames and carry out a yoga correction step. Here, we replace or omit those classifications that are deemed infeasible.
The animation above details the four-stage process. We identify the human in the image (green box), extract the pose (green skeleton), identify the yoga pose, and repeat the processes for frames extracted at 1fps.
Performance
Performance of the yoga pose detection model was performed on hold-out labeled test images.
- mxNet Auto Gluon used for AutoML with 145 models run
- Test micro-F1 score of ~0.87 on consolidated training data with a split of images between Train/Val/Test data of 8857 /2215/2768 images respectively. The features used were normalized joint coordinates, joint distances, joint relative positions,. Micro-F1 score of 0.87 on test images and an accuracy of 0.73 on manually-labeled 60-minute session video
- The final model is light GBM (gradient boosted tree)
F.A.Q
Frequently Asked Questions
-
What do I need to get started?
A computer with a browser and a web camera is all you need!
-
Can I upload my videos for analysis?
Yes! Our online upload video tool will be available shortly. You would be able to look at your sessions in your account page video gallery.
-
Is this available on a mobile phone?
We are currently developing our UI to run on iOS/Android seamlessly. Hang tight!
Testimonials
We appreciate the help of domain experts and stakeholders who provided us support and data for developing the solution, here are some thoughts they have about the product.
Our Data Science Team
Meet the team behind this project