Aakash Kumar: UCF Research and Contributions to Computer Vision

Aakash Kumar is a fifth-year Ph.D. student at the Center for Research in Computer Vision (CRCV) at the University of Central Florida (UCF), working under the supervision of Professor Yogesh Singh Rawat. His research interests lie broadly in the fields of deep learning and computer vision. Notably, he received the Doctoral Research Support Award from UCF in April 2025. Kumar is currently seeking full-time positions starting in January 2026. His work spans various areas, including video action detection, self-supervised learning, and 3D object tracking.

Key Research Areas

Kumar's research encompasses several critical areas within computer vision, addressing challenges in video understanding, domain adaptation, and multi-object tracking. His work often focuses on leveraging semi-supervised and self-supervised learning techniques to improve the performance and efficiency of computer vision models.

Video Action Detection

Video action detection is a core focus of Kumar's research. He has explored various approaches to improve the accuracy and efficiency of detecting actions in videos, including semi-supervised learning and the analysis of limitations and challenges in this field.

End-to-end Semi-Supervised Learning: Kumar developed the first end-to-end semi-supervised approach for video action detection tasks. This method allows models to learn from both labeled and unlabeled data, improving performance while reducing the need for large labeled datasets. The approach learns from mistakes on a labeled set and transfers that learning to pseudo labels from an unlabeled set to enhance spatio-temporal localization.
High-Pass Filtering for Enhanced Pseudo Labels: To further refine the spatio-temporal localization of actions, Kumar utilized high-pass filtering techniques to enhance pseudo labels. This method helps to focus on the most informative parts of the video, improving the model's ability to accurately identify and localize actions.
Analysis of Limitations and Challenges: Kumar has also contributed to the field by analyzing the limitations and challenges in video action detection. This work helps to identify areas where further research is needed and provides insights into the difficulties of this task.
Semi-supervised active learning: Kumar also worked on semi-supervised active learning for video action detection.

Publications:

End-to-end semi-supervised learning for video action detection (CVPR 2022)
Video action detection: Analysing limitations and challenges (CVPRW 2022: Vision Dataset Understanding Workshop)
Semi-supervised active learning for video action detection (AAAI 2024)
Stable Mean Teacher for Semi-supervised Video Action Detection (AAAI 2025)

Self-Supervised Learning

Self-supervised learning is another significant area of Kumar's research. He has investigated the impact of pre-training in self-supervised learning for videos and developed methods for self-supervised video representation learning.

Impact of Pre-training: Kumar conducted the first exhaustive study on the impact of pre-training in self-supervised learning for videos. This research provides valuable insights into how pre-training can improve the performance of video models and helps to guide the development of new self-supervised learning techniques.
Self-Supervised Video Representation Learning: Kumar has also worked on benchmarking self-supervised video representation learning. This work involves evaluating different self-supervised learning methods on video data and identifying the most effective techniques for learning useful representations.
Vision language models (VLMs): Developed first vision language models (VLMs) for dense multimodal video detection task without any labels.

Publications:

Benchmarking self-supervised video representation learning (NeurIPSW 2023 - 4th Self-Supervised Learning Workshop)
Self Supervised Learning for Multiple Object Tracking in 3D Point Clouds (2022 IEEE/RSJ International Conference on Intelligent Robots and Systems)

3D Multi-Object Tracking

Kumar's work extends to 3D multi-object tracking, where he has developed methods to improve the accuracy and efficiency of tracking objects in 3D point clouds.

Read also: Guide to Aakash Scholarship Tests

Point Cloud based Deep Affinity Network (PC-DAN): Kumar developed PC-DAN, a deep affinity network that uses point cloud data to track multiple objects in 3D. This method improves the accuracy of tracking by leveraging the spatial information provided by point clouds.
Sparse Points to Dense Clouds: Kumar has worked on enhancing 3D detection with limited LiDAR data, using sparse points to generate dense clouds. This approach helps to improve the performance of 3D detection models in scenarios where LiDAR data is scarce.

Publications:

PC-DAN: Point Cloud based Deep Affinity Network for 3D Multi-Object Tracking
Sparse Points to Dense Clouds: Enhancing 3D Detection with Limited LiDAR Data

Weakly Supervised Spatio-Temporal Video Grounding

Kumar's research also explores weakly supervised spatio-temporal video grounding, focusing on methods that require less manual annotation.

Contextual Self-paced Learning: He developed a contextual self-paced learning approach for weakly supervised spatio-temporal video grounding. This method leverages the context of the video to guide the learning process, improving the accuracy of grounding with limited supervision.
Spatial and Temporal Progressive Learning (STPro): Kumar introduced STPro, a method that uses spatial and temporal progressive learning for weakly supervised spatio-temporal grounding. This approach progressively learns from different parts of the video, improving the model's ability to accurately ground objects in space and time.

Publications:

Contextual Self-paced Learning for Weakly Supervised Spatio-Temporal Video Grounding (ICLR 2025)
STPro: Spatial and Temporal Progressive Learning for Weakly Supervised Spatio-Temporal Grounding (CVPR 2025)

Other Notable Contributions

Beyond the core areas mentioned above, Kumar has also made contributions to other areas of computer vision and related fields.

Deepfake Detection: Kumar has worked on detecting deepfakes using metric learning. This research is important for identifying and preventing the spread of fake videos.
Forgery Classification: He has also explored forgery classification via unsupervised domain adaptation. This work aims to improve the ability of models to detect forgeries in different domains without requiring labeled data.
Bird Species Classification: Kumar has worked on bird species classification using transfer learning with multistage training. This research demonstrates the effectiveness of transfer learning for image classification tasks.
Solar Potential Analysis: He has also explored the use of satellite imagery for solar potential analysis of rooftops. This work has potential applications in renewable energy and urban planning.
Gabriellav2: Kumar contributed to the development of Gabriellav2, which aims towards better generalization in surveillance videos for action detection.
IceBreaker: He worked on solving the cold start problem for video recommendation engines.

Publications:

Detecting Deepfakes with Metric Learning (8th International Workshop on Biometrics and Forensics, 2020)
Syn2Real: Forgery Classification via Unsupervised Domain Adaptation (IEEE Winter Conference on Applications of Computer Vision (WACV) Workshops, 2020)
Bird species classification using transfer learning with multistage training (Workshop on Computer Vision Applications, 2018)
Solar potential analysis of rooftops using satellite imagery
Gabriellav2: Towards better generalization in surveillance videos for action detection (Proceedings of the IEEE/CVF Winter Conference on Applications of Computer â¦, 2022)
Bird species classification using transfer learning with multistage training
IceBreaker: Solving cold start problem for video recommendation engines (2018 IEEE international symposium on multimedia (ISM), 217-222, 2018)

Application in Decision Science Technology

Aakash Kumar also has experience at Decision Science Technology in Bellevue, WA and at Visual Shopping Team, Palo Alto, CA.

tags: #aakash #kumar #ucf #research

Aakash Kumar: UCF Research and Contributions to Computer Vision

Key Research Areas

Video Action Detection

Publications:

Self-Supervised Learning

Publications:

3D Multi-Object Tracking

Publications:

Weakly Supervised Spatio-Temporal Video Grounding

Publications:

Other Notable Contributions

Publications:

Application in Decision Science Technology

Popular posts:

Company

For Learners

Connect with us