Trending repositories for topic video-classification
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark
Video classification tools using 3D ResNet
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark
Video classification tools using 3D ResNet
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark
Tutorial for video classification/ action recognition using 3D CNN/ CNN+RNN on UCF101
Video classification tools using 3D ResNet
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark
Tutorial for video classification/ action recognition using 3D CNN/ CNN+RNN on UCF101
Video classification tools using 3D ResNet
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark
Papers, code and datasets about deep learning and multi-modal learning for video analysis
Video classification tools using 3D ResNet
Official source code for the paper: "Reading Between the Frames Multi-Modal Non-Verbal Depression Detection in Videos"
CricShot10 is a video action recognition dataset consisting of 10 cricket batting shots. This dataset was developed using the videos from YouTube.
Implementation of STAM (Space Time Attention Model), a pure and simple attention model that reaches SOTA for video classification
Tutorial for video classification/ action recognition using 3D CNN/ CNN+RNN on UCF101
Implementation of TimeSformer from Facebook AI, a pure attention-based solution for video classification
Developed the ViViT model for medical video classification, enhancing 3D organ image analysis using transformer-based architectures.
An unofficial implementation of TubeViT in "Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning"
Easiest way of fine-tuning HuggingFace video classification models
Official source code for the paper: "Reading Between the Frames Multi-Modal Non-Verbal Depression Detection in Videos"
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
Developed the ViViT model for medical video classification, enhancing 3D organ image analysis using transformer-based architectures.
CricShot10 is a video action recognition dataset consisting of 10 cricket batting shots. This dataset was developed using the videos from YouTube.
Implementation of STAM (Space Time Attention Model), a pure and simple attention model that reaches SOTA for video classification
OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark
An unofficial implementation of TubeViT in "Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning"
Papers, code and datasets about deep learning and multi-modal learning for video analysis
Easiest way of fine-tuning HuggingFace video classification models
Video classification tools using 3D ResNet
Implementation of TimeSformer from Facebook AI, a pure attention-based solution for video classification
Tutorial for video classification/ action recognition using 3D CNN/ CNN+RNN on UCF101
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
Developed the ViViT model for medical video classification, enhancing 3D organ image analysis using transformer-based architectures.
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark
Tutorial for video classification/ action recognition using 3D CNN/ CNN+RNN on UCF101
Papers, code and datasets about deep learning and multi-modal learning for video analysis
Implementation of TimeSformer from Facebook AI, a pure attention-based solution for video classification
Official source code for the paper: "Reading Between the Frames Multi-Modal Non-Verbal Depression Detection in Videos"
Video classification tools using 3D ResNet
CricShot10 is a video action recognition dataset consisting of 10 cricket batting shots. This dataset was developed using the videos from YouTube.
An unofficial implementation of TubeViT in "Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning"
[ICLR 2022] TAda! Temporally-Adaptive Convolutions for Video Understanding. This codebase provides solutions for video classification, video representation learning and temporal detection.
Developed the ViViT model for medical video classification, enhancing 3D organ image analysis using transformer-based architectures.
Easiest way of fine-tuning HuggingFace video classification models
Make video classification on UCF101 using CNN and RNN based on Pytorch framework.
Simplest and fastest image and text annotation tool.
[CVPR 2021] TDN: Temporal Difference Networks for Efficient Action Recognition
Keras 3 Implementation of Video Swin Transformers for 3D Video Modeling
Implementation of STAM (Space Time Attention Model), a pure and simple attention model that reaches SOTA for video classification
Keras 3 Implementation of Video Swin Transformers for 3D Video Modeling
An unofficial implementation of TubeViT in "Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning"
CricShot10 is a video action recognition dataset consisting of 10 cricket batting shots. This dataset was developed using the videos from YouTube.
Make video classification on UCF101 using CNN and RNN based on Pytorch framework.
Easiest way of fine-tuning HuggingFace video classification models
Deepfakes Video classification via CNN, LSTM, C3D and triplets [IWBF'20]
OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark
[ICLR 2022] TAda! Temporally-Adaptive Convolutions for Video Understanding. This codebase provides solutions for video classification, video representation learning and temporal detection.
The notebook explains the various steps to obtain the results of publication: "Is Space-Time Attention All You Need for Video Understanding?"
[Neurocomputing 2019] Fast and Robust Dynamic Hand Gesture Recognition via Key Frames Extraction and Feature Fusion
Papers, code and datasets about deep learning and multi-modal learning for video analysis
Implementation of STAM (Space Time Attention Model), a pure and simple attention model that reaches SOTA for video classification
SoccerAct10 is a dataset which contains 10 different soccer actions. This dataset was developed using the videos from YouTube.
Tutorial for video classification/ action recognition using 3D CNN/ CNN+RNN on UCF101
Implementation of TimeSformer from Facebook AI, a pure attention-based solution for video classification
3D ResNet Video Classification accelerated by TensorRT
Simplest and fastest image and text annotation tool.