14 results found Sort:

21
497
mit
6
Transform Video as a Document with ChatGPT, CLIP, BLIP2, GRIT, Whisper, LangChain.
Created 2023-04-20
25 commits to main branch, last one 10 months ago
54
330
mit
10
An official implementation for " UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation"
Created 2020-10-30
20 commits to main branch, last one about a year ago
26
311
apache-2.0
7
[CVPR2022] Official Implementation of ReferFormer
Created 2022-01-02
56 commits to main branch, last one about a month ago
21
290
mit
5
[ICCV2023] UniVTG: Towards Unified Video-Language Temporal Grounding
Created 2023-05-27
134 commits to main branch, last one 23 days ago
16
274
unknown
6
[CVPR2023] All in One: Exploring Unified Video-Language Pre-training
Created 2022-03-14
29 commits to main branch, last one about a year ago
The repository collects many various multi-modal transformer architectures, including image transformer, video transformer, image-language transformer, video-language transformer and self-supervised l...
Created 2021-04-07
418 commits to main branch, last one about a year ago
19
210
unknown
3
[NeurIPS2022] Egocentric Video-Language Pretraining
Created 2022-05-31
48 commits to main branch, last one 23 days ago
17
184
bsd-3-clause
7
Align and Prompt: Video-and-Language Pre-training with Entity Prompts
Created 2021-12-11
14 commits to main branch, last one about a year ago
Pytorch code for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
Created 2022-05-20
17 commits to main branch, last one about a year ago
A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.
Created 2023-12-16
30 commits to master branch, last one 8 days ago
[CVPR21] Visual Semantic Role Labeling for Video Understanding (https://arxiv.org/abs/2104.00990)
Created 2021-02-10
57 commits to main branch, last one 2 years ago
A Survey on video and language understanding.
Created 2023-04-14
23 commits to main branch, last one about a year ago
The Pytorch implementation for "Video-Text Pre-training with Learned Regions"
Created 2021-11-21
18 commits to main branch, last one about a year ago
PyTorch code for "Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative Latent Attention" (WACV 2023)
Created 2022-03-14
123 commits to main branch, last one about a year ago