26 results found Sort:

[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
Created 2023-04-19
204 commits to main branch, last one 2 months ago
[CVPR 2021 Best Student Paper Honorable Mention, Oral] Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks.
Created 2021-02-10
14 commits to main branch, last one 2 years ago
60
559
bsd-3-clause
12
Official code for Goldfish model for long video understanding and MiniGPT4-video for short video understanding
Created 2024-03-23
189 commits to main branch, last one 3 months ago
11
284
apache-2.0
5
Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Pre-training Dataset and Benchmarks
Created 2023-06-06
18 commits to main branch, last one 10 months ago
18
220
apache-2.0
5
mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video (ICML 2023)
Created 2023-05-22
4 commits to main branch, last one about a year ago
18
185
bsd-3-clause
7
Align and Prompt: Video-and-Language Pre-training with Entity Prompts
Created 2021-12-11
14 commits to main branch, last one 2 years ago
22
178
bsd-3-clause
3
[NeurIPS 2023] Self-Chained Image-Language Model for Video Localization and Question Answering
Created 2023-05-10
36 commits to main branch, last one 10 months ago
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models
Created 2024-08-26
3 commits to main branch, last one 2 months ago
23
156
apache-2.0
5
[NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
Created 2022-09-25
16 commits to main branch, last one about a year ago
A PyTorch implementation of VIOLET
Created 2021-11-24
47 commits to main branch, last one 11 months ago
12
131
mit
2
NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR'21)
Created 2021-02-28
89 commits to main branch, last one 3 months ago
[ACL 2020] PyTorch code for TVQA+: Spatio-Temporal Grounding for Video Question Answering
Created 2019-04-22
13 commits to master branch, last one 2 years ago
9
121
mit
3
[NeurIPS 2022 Spotlight] Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations
Created 2022-09-23
33 commits to main branch, last one 7 months ago
15
117
apache-2.0
5
[ICCV 2021 Oral + TPAMI] Just Ask: Learning to Answer Questions from Millions of Narrated Videos
Created 2021-03-22
47 commits to main branch, last one about a year ago
5
109
apache-2.0
4
[CVPR 2023 Highlight] Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning
Created 2023-02-28
33 commits to main branch, last one 7 months ago
A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.
Created 2023-12-16
48 commits to master branch, last one about a month ago
Large Language Models are Temporal and Causal Reasoners for Video Question Answering (EMNLP 2023)
Created 2023-10-19
20 commits to main branch, last one 3 months ago
Can I Trust Your Answer? Visually Grounded Video Question Answering (CVPR'24, Highlight)
Created 2023-08-28
54 commits to main branch, last one 4 months ago
[CVPR 2022] A large-scale public benchmark dataset for video question-answering, especially about evidence and commonsense reasoning. The code used in our paper "From Representation to Reasoning: Towa...
Created 2022-05-31
6 commits to master branch, last one 4 months ago
0
49
apache-2.0
2
FreeVA: Offline MLLM as Training-Free Video Assistant
Created 2024-05-13
16 commits to main branch, last one 5 months ago
0
46
mit
2
[TIP 2022] Official code of paper “Video Question Answering with Prior Knowledge and Object-sensitive Learning”
Created 2022-01-24
43 commits to main branch, last one 9 months ago
12
45
apache-2.0
4
Video Graph Transformer for Video Question Answering (ECCV'22)
Created 2022-07-20
26 commits to main branch, last one about a year ago
A PyTorch implementation of EmpiricalMVM
Created 2023-03-09
9 commits to main branch, last one 11 months ago
[ICCV2023] Tem-adapter: Adapting Image-Text Pretraining for Video Question Answer
Created 2023-07-22
7 commits to main branch, last one about a year ago
6
32
mit
7
MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models (CVPR 2023)
Created 2023-03-23
9 commits to master branch, last one 7 months ago