Search Results - RepositoryStats

1.3k

4.5k

apache-2.0

40

OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark

ava i3d tsm tsn x3d posec3d pytorch slowfast benchmark non-local openmmlab uniformerv2 deep-learning action-recognition video-understanding video-classification temporal-action-localization spatial-temporal-action-detection

Created 2020-07-11

2,061 commits to main branch, last one about a year ago

awesome-action-recognition jinwchoi

724

3.9k

unknown

207

A curated list of action recognition and related area resources

awesome awesome-list pose-estimation action-detection video-processing video-recognition action-recognition object-recognition video-understanding activity-recognition action-classification activity-understanding

Created 2016-09-22

296 commits to master branch, last one about a year ago

Ask-Anything OpenGVLab

260

3.2k

mit

35

[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.

chat video gradio chatgpt stablelm big-model langchain large-model captioning-videos foundation-models video-understanding large-language-models video-question-answering

Created 2023-04-19

207 commits to main branch, last one about a month ago

temporal-shift-module mit-han-lab

418

2.1k

mit

41

[ICCV 2019] TSM: Temporal Shift Module for Efficient Video Understanding

tsm low-latency acceleration efficient-model temporal-modeling nvidia-jetson-nano video-understanding

Created 2019-03-27

48 commits to master branch, last one 8 months ago

mmaction open-mmlab

350

1.9k

apache-2.0

39

An open-source toolbox for action understanding based on PyTorch

pytorch action-detection action-recognition video-understanding temporal-action-detection temporal-action-localization spatial-temporal-action-detection

Created 2019-06-13

179 commits to master branch, last one 4 years ago

InternVideo OpenGVLab

103

1.7k

apache-2.0

27

[ECCV2024] Video Foundation Models & Data for Multimodal Understanding

Created 2022-11-23

245 commits to main branch, last one 11 days ago

PaddleVideo PaddlePaddle

381

1.6k

apache-2.0

37

Awesome video understanding toolkits based on PaddlePaddle. It supports video data annotation tools, lightweight RGB and skeleton based action recognition model, practical applications for video taggi...

ava bmn tsm tsn pp-tsm st-gcn t2vlad actbert slowfast videotag youtube-8m activitynet kinetics400 action-detection video-recognition action-recognition action-localization video-understanding temporal-action-detection

Created 2020-11-12

2,918 commits to develop branch, last one 27 days ago

temporal-segment-networks yjxiong

475

1.5k

bsd-2-clause

41

Code & Models for Temporal Segment Networks (TSN) in ECCV 2016

action-recognition video-understanding temporal-segment-networks

Created 2016-07-14

129 commits to master branch, last one 5 years ago

VideoMAE MCG-NJU

142

1.4k

other

15

[NeurIPS 2022 Spotlight] VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training

mae pytorch transformer neurips-2022 video-analysis video-transformer action-recognition masked-autoencoder vision-transformer video-understanding self-supervised-learning video-representation-learning

Created 2022-03-23

64 commits to main branch, last one about a year ago

tsn-pytorch yjxiong

310

1.1k

bsd-2-clause

26

Temporal Segment Networks (TSN) in PyTorch

pytorch deep-learning action-recognition video-understanding temporal-segment-networks

Created 2017-08-10

31 commits to master branch, last one 5 years ago

awesome-grounding TheShadow29

99

1.1k

mit

29

awesome grounding: A curated list of research papers in visual grounding

arxiv paper papers grounding awesome-list paper-roadmap embodied-agent computer-vision image-grounding video-grounding phrase-grounding visual-grounding captioning-images captioning-videos language-grounding video-understanding multimodal-deep-learning natural-language-processing

Created 2018-09-03

97 commits to master branch, last one about a year ago

Chat-UniVi PKU-YuanGroup

45

918

apache-2.0

9

[CVPR 2024 Highlight🔥] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding

image-understanding video-understanding large-language-models vision-language-model

Created 2023-11-13

83 commits to main branch, last one 4 months ago

MiniGPT4-video Vision-CAIR

66

596

bsd-3-clause

12

Official code for Goldfish model for long video understanding and MiniGPT4-video for short video understanding

video-retrieval video-understanding long-video-understanding video-question-answering

Created 2024-03-23

190 commits to main branch, last one 3 months ago

VideoMAEv2 OpenGVLab

68

592

mit

7

[CVPR 2023] VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking

cvpr2023 action-detection foundation-model action-recognition video-understanding self-supervised-learning temporal-action-detection

Created 2023-04-04

21 commits to master branch, last one 5 months ago

MeViS henghuiding

22

520

mit

8

[ICCV 2023] MeViS: A Large-scale Benchmark for Video Segmentation with Motion Expressions

mose-dataset mevis-dataset multimodal-learning video-understanding referring-expression-segmentation referring-expression-comprehension referring-video-object-segmentation

Created 2023-08-01

11 commits to main branch, last one about a year ago

TDN MCG-NJU

55

376

apache-2.0

9

[CVPR 2021] TDN: Temporal Difference Networks for Efficient Action Recognition

pytorch cvpr2021 temporal-modeling action-recognition video-understanding video-classification

Created 2020-12-17

27 commits to main branch, last one 2 years ago

SpecVQGAN v-iashin

40

358

mit

8

Source code for "Taming Visually Guided Sound Generation" (Oral at the BMVC 2021)

gan vas bmvc audio video vqvae melgan pytorch vggsound multi-modal transformer video-features audio-generation evaluation-metrics video-understanding

Created 2021-10-17

24 commits to main branch, last one 8 months ago

movienet-tools movienet

34

287

unknown

10

Tools for movie and video research

movie deep-learning cross-modality shot-detection computer-vision person-analysis vision-language action-recognition video-understanding

Created 2019-06-05

91 commits to master branch, last one 2 years ago

sn-gamestate SoccerNet

57

278

gpl-3.0

19

[CVPRW'24] SoccerNet Game State Reconstruction: End-to-End Athlete Tracking and Identification on a Minimap (CVPR24 - CVSports workshop)

soccer sports tracking detection soccernet bird-eye-view sports-analytics re-identification video-understanding multi-object-tracking

Created 2024-02-05

263 commits to main branch, last one 29 days ago

MA-LMM boheumd

27

276

mit

4

(2024CVPR) MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding

llm video-understanding

Created 2024-03-26

18 commits to main branch, last one 7 months ago

phar rlleshi

28

259

apache-2.0

9

deep learning sex position classifier

sex pornhub pytorch porn-filter deep-learning sex-classifier action-recognition video-understanding video-classification human-action-recognition

Created 2022-02-05

113 commits to master branch, last one about a year ago

Multiverse JunweiLiang

64

255

apache-2.0

8

Dataset, code and model for the CVPR'20 paper "The Garden of Forking Paths: Towards Multi-Future Trajectory Prediction". And for the ECCV'20 SimAug paper.

3d-simulation computer-vision video-understanding trajectory-prediction trajectory-prediction-benchmark

Created 2019-12-22

87 commits to master branch, last one 2 years ago

TeViT hustvl

17

239

mit

8

Temporally Efficient Vision Transformer for Video Instance Segmentation, CVPR 2022, Oral

video-understanding instance-segmentation video-instance-segmentation

Created 2022-03-21

19 commits to main branch, last one 2 years ago

Cap4Video whwu95

20

234

mit

7

【CVPR'2023 Highlight & TPAMI】Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?

video-understanding cross-modal-learning video-text-retrieval video-language-understanding

Created 2023-01-07

32 commits to main branch, last one 3 months ago

OpenTAD sming256

16

232

apache-2.0

5

OpenTAD is an open-source temporal action detection (TAD) toolbox based on PyTorch.

video-understanding temporal-action-detection temporal-action-localization

Created 2024-03-28

60 commits to main branch, last one 10 days ago

TAdaConv alibaba-mmai-research

33

230

apache-2.0

8

[ICLR 2022] TAda! Temporally-Adaptive Convolutions for Video Understanding. This codebase provides solutions for video classification, video representation learning and temporal detection.

pytorch tadaconv action-recognition action-localization video-understanding video-classification self-supervised-learning

Created 2021-06-23

27 commits to main branch, last one about a year ago