17 results found Sort:

29
567
unknown
8
[CVPR 2025] Video Narration as Vocabulary & Video as Long Document
Created 2023-04-20
25 commits to main branch, last one about a month ago
56
351
mit
9
An official implementation for " UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation"
Created 2020-10-30
20 commits to main branch, last one 2 years ago
32
347
mit
6
[ICCV 2023] UniVTG: Towards Unified Video-Language Temporal Grounding
Created 2023-05-27
134 commits to main branch, last one 11 months ago
25
339
apache-2.0
7
[CVPR2022] Official Implementation of ReferFormer
Created 2022-01-02
63 commits to main branch, last one 2 months ago
18
281
unknown
6
[CVPR2023] All in One: Exploring Unified Video-Language Pre-training
Created 2022-03-14
29 commits to main branch, last one 2 years ago
20
238
unknown
3
[NeurIPS 2022] Egocentric Video-Language Pretraining
Created 2022-05-31
48 commits to main branch, last one 11 months ago
The repository collects many various multi-modal transformer architectures, including image transformer, video transformer, image-language transformer, video-language transformer and self-supervised l...
Created 2021-04-07
418 commits to main branch, last one 2 years ago
17
186
bsd-3-clause
6
Align and Prompt: Video-and-Language Pre-training with Entity Prompts
Created 2021-12-11
14 commits to main branch, last one 2 years ago
A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.
Created 2023-12-16
49 commits to master branch, last one 2 months ago
Pytorch code for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
Created 2022-05-20
17 commits to main branch, last one 2 years ago
[CVPR21] Visual Semantic Role Labeling for Video Understanding (https://arxiv.org/abs/2104.00990)
Created 2021-02-10
57 commits to main branch, last one 3 years ago
A Survey on video and language understanding.
Created 2023-04-14
23 commits to main branch, last one about a year ago
The Pytorch implementation for "Video-Text Pre-training with Learned Regions"
Created 2021-11-21
18 commits to main branch, last one 2 years ago
A curated list of video-text datasets in a variety of languages. These datasets can be used for video captioning (video description) or video retrieval.
Created 2023-01-03
26 commits to main branch, last one about a year ago
2
33
unknown
4
[NeurIPS 2024 D&B] VideoGUI: A Benchmark for GUI Automation from Instructional Videos
Created 2024-06-16
13 commits to main branch, last one 8 days ago
PyTorch code for "Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative Latent Attention" (WACV 2023)
Created 2022-03-14
123 commits to main branch, last one 2 years ago
[EMNLP 2024] A Video Chat Agent with Temporal Prior
Created 2024-02-25
18 commits to main branch, last one about a month ago