16 results found Sort:
Transform Video as a Document with ChatGPT, CLIP, BLIP2, GRIT, Whisper, LangChain.
Created
2023-04-20
25 commits to main branch, last one about a year ago
An official implementation for " UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation"
Created
2020-10-30
20 commits to main branch, last one 2 years ago
[CVPR2022] Official Implementation of ReferFormer
Created
2022-01-02
56 commits to main branch, last one 8 months ago
[ICCV2023] UniVTG: Towards Unified Video-Language Temporal Grounding
Created
2023-05-27
134 commits to main branch, last one 7 months ago
[CVPR2023] All in One: Exploring Unified Video-Language Pre-training
Created
2022-03-14
29 commits to main branch, last one about a year ago
[NeurIPS2022] Egocentric Video-Language Pretraining
Created
2022-05-31
48 commits to main branch, last one 7 months ago
The repository collects many various multi-modal transformer architectures, including image transformer, video transformer, image-language transformer, video-language transformer and self-supervised l...
Created
2021-04-07
418 commits to main branch, last one 2 years ago
Align and Prompt: Video-and-Language Pre-training with Entity Prompts
Created
2021-12-11
14 commits to main branch, last one 2 years ago
Pytorch code for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
Created
2022-05-20
17 commits to main branch, last one 2 years ago
A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.
Created
2023-12-16
48 commits to master branch, last one 2 months ago
[CVPR21] Visual Semantic Role Labeling for Video Understanding (https://arxiv.org/abs/2104.00990)
Created
2021-02-10
57 commits to main branch, last one 3 years ago
A Survey on video and language understanding.
Created
2023-04-14
23 commits to main branch, last one about a year ago
The Pytorch implementation for "Video-Text Pre-training with Learned Regions"
Created
2021-11-21
18 commits to main branch, last one 2 years ago
PyTorch code for "Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative Latent Attention" (WACV 2023)
Created
2022-03-14
123 commits to main branch, last one about a year ago
A curated list of video-text datasets in a variety of languages. These datasets can be used for video captioning (video description) or video retrieval.
Created
2023-01-03
26 commits to main branch, last one 10 months ago
[EMNLP 2024] A Video Chat Agent with Temporal Prior
Created
2024-02-25
15 commits to main branch, last one 2 days ago