Trending repositories for topic video-classification
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark
An unofficial implementation of TubeViT in "Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning"
[CVPR 2021] TDN: Temporal Difference Networks for Efficient Action Recognition
Tutorial for video classification/ action recognition using 3D CNN/ CNN+RNN on UCF101
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
An unofficial implementation of TubeViT in "Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning"
[CVPR 2021] TDN: Temporal Difference Networks for Efficient Action Recognition
OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark
Tutorial for video classification/ action recognition using 3D CNN/ CNN+RNN on UCF101
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark
Implementation of TimeSformer from Facebook AI, a pure attention-based solution for video classification
Tutorial for video classification/ action recognition using 3D CNN/ CNN+RNN on UCF101
Video classification tools using 3D ResNet
[ICLR 2022] TAda! Temporally-Adaptive Convolutions for Video Understanding. This codebase provides solutions for video classification, video representation learning and temporal detection.
[CVPR 2021] TDN: Temporal Difference Networks for Efficient Action Recognition
Papers, code and datasets about deep learning and multi-modal learning for video analysis
This project is designed to display how we can utilize deep learning methods for Sports Data Analytics.
Keras 3 Implementation of Video Swin Transformers for 3D Video Modeling
An unofficial implementation of TubeViT in "Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning"
CricShot10 is a video action recognition dataset consisting of 10 cricket batting shots. This dataset was developed using the videos from YouTube.
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
This project is designed to display how we can utilize deep learning methods for Sports Data Analytics.
Keras 3 Implementation of Video Swin Transformers for 3D Video Modeling
OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark
An unofficial implementation of TubeViT in "Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning"
Implementation of TimeSformer from Facebook AI, a pure attention-based solution for video classification
[ICLR 2022] TAda! Temporally-Adaptive Convolutions for Video Understanding. This codebase provides solutions for video classification, video representation learning and temporal detection.
CricShot10 is a video action recognition dataset consisting of 10 cricket batting shots. This dataset was developed using the videos from YouTube.
[CVPR 2021] TDN: Temporal Difference Networks for Efficient Action Recognition
Tutorial for video classification/ action recognition using 3D CNN/ CNN+RNN on UCF101
Video classification tools using 3D ResNet
Papers, code and datasets about deep learning and multi-modal learning for video analysis
Developed the ViViT model for medical video classification, enhancing 3D organ image analysis using transformer-based architectures.
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark
Papers, code and datasets about deep learning and multi-modal learning for video analysis
Tutorial for video classification/ action recognition using 3D CNN/ CNN+RNN on UCF101
Implementation of TimeSformer from Facebook AI, a pure attention-based solution for video classification
Official source code for the paper: "Reading Between the Frames Multi-Modal Non-Verbal Depression Detection in Videos"
Video classification tools using 3D ResNet
CricShot10 is a video action recognition dataset consisting of 10 cricket batting shots. This dataset was developed using the videos from YouTube.
Developed the ViViT model for medical video classification, enhancing 3D organ image analysis using transformer-based architectures.
This project is designed to display how we can utilize deep learning methods for Sports Data Analytics.
An unofficial implementation of TubeViT in "Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning"
Easiest way of fine-tuning HuggingFace video classification models
[ICLR 2022] TAda! Temporally-Adaptive Convolutions for Video Understanding. This codebase provides solutions for video classification, video representation learning and temporal detection.
Keras 3 Implementation of Video Swin Transformers for 3D Video Modeling
[CVPR 2021] TDN: Temporal Difference Networks for Efficient Action Recognition
Make video classification on UCF101 using CNN and RNN based on Pytorch framework.
Deepfakes Video classification via CNN, LSTM, C3D and triplets [IWBF'20]
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
Official source code for the paper: "Reading Between the Frames Multi-Modal Non-Verbal Depression Detection in Videos"
Keras 3 Implementation of Video Swin Transformers for 3D Video Modeling
An unofficial implementation of TubeViT in "Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning"
CricShot10 is a video action recognition dataset consisting of 10 cricket batting shots. This dataset was developed using the videos from YouTube.
Make video classification on UCF101 using CNN and RNN based on Pytorch framework.
Easiest way of fine-tuning HuggingFace video classification models
Deepfakes Video classification via CNN, LSTM, C3D and triplets [IWBF'20]
OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark
[Neurocomputing 2019] Fast and Robust Dynamic Hand Gesture Recognition via Key Frames Extraction and Feature Fusion
Papers, code and datasets about deep learning and multi-modal learning for video analysis
SoccerAct10 is a dataset which contains 10 different soccer actions. This dataset was developed using the videos from YouTube.
[ICLR 2022] TAda! Temporally-Adaptive Convolutions for Video Understanding. This codebase provides solutions for video classification, video representation learning and temporal detection.
Tutorial for video classification/ action recognition using 3D CNN/ CNN+RNN on UCF101
Implementation of TimeSformer from Facebook AI, a pure attention-based solution for video classification
Implementation of STAM (Space Time Attention Model), a pure and simple attention model that reaches SOTA for video classification
The notebook explains the various steps to obtain the results of publication: "Is Space-Time Attention All You Need for Video Understanding?"