Search Results - RepositoryStats

253

3.1k

mit

37

[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.

chat video gradio chatgpt stablelm big-model langchain large-model captioning-videos foundation-models video-understanding large-language-models video-question-answering

Created 2023-04-19

205 commits to main branch, last one about a month ago

InternVideo OpenGVLab

91

1.5k

apache-2.0

27

[ECCV2024] Video Foundation Models & Data for Multimodal Understanding

Created 2022-11-23

229 commits to main branch, last one 19 days ago

ClipBERT jayleicn

86

714

mit

10

[CVPR 2021 Best Student Paper Honorable Mention, Oral] Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks.

vqa pytorch cvpr2021 video-retrieval vision-and-language video-question-answering

Created 2021-02-10

14 commits to main branch, last one 2 years ago

MiniGPT4-video Vision-CAIR

61

571

bsd-3-clause

12

Official code for Goldfish model for long video understanding and MiniGPT4-video for short video understanding

video-retrieval video-understanding long-video-understanding video-question-answering

Created 2024-03-23

190 commits to main branch, last one 20 days ago

Youku-mPLUG X-PLUG

11

288

apache-2.0

6

Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Pre-training Dataset and Benchmarks

mllm video youku chinese dataset benchmark multimodal video-retrieval multimodal-pretraining video-question-answering multimodal-large-language-models

Created 2023-06-06

18 commits to main branch, last one 11 months ago

mPLUG-2 X-PLUG

19

221

apache-2.0

5

mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video (ICML 2023)

vqa mllm mplug video multimodal image-retrieval video-retrieval foundation-models multimodal-pretraining video-question-answering

Created 2023-05-22

4 commits to main branch, last one about a year ago

ml-slowfast-llava apple

12

190

other

11

SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models

video-question-answering multimodal-large-language-models

Created 2024-08-26

3 commits to main branch, last one 3 months ago

ALPRO salesforce

18

187

bsd-3-clause

7

Align and Prompt: Video-and-Language Pre-training with Entity Prompts

video-language prompt-learning vision-and-language video-text-retrieval representation-learning video-question-answering

Created 2021-12-11

14 commits to main branch, last one 2 years ago

SeViLA Yui010206

22

182

bsd-3-clause

3

[NeurIPS 2023] Self-Chained Image-Language Model for Video Localization and Question Answering

mllm video-localization video-question-answering

Created 2023-05-10

36 commits to main branch, last one 11 months ago

FrozenBiLM antoyang

23

156

apache-2.0

5

[NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models

vqa videoqa pre-training multimodal-learning video-understanding vision-and-language large-language-models video-question-answering visual-question-answering weakly-supervised-learning

Created 2022-09-25

18 commits to main branch, last one 21 days ago

pytorch_violet tsujuifu

6

137

unknown

9

A PyTorch implementation of VIOLET

pytorch pre-training video-retrieval vision-and-language video-question-answering

Created 2021-11-24

47 commits to main branch, last one about a year ago

NExT-QA doc-doc

13

136

mit

2

NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR'21)

videoqa vision-language video-understanding multi-object-interaction video-question-answering causal-temporal-action-reasoning

Created 2021-02-28

89 commits to main branch, last one 5 months ago

TVQAplus jayleicn

24

126

mit

10

[ACL 2020] PyTorch code for TVQA+: Spatio-Temporal Grounding for Video Question Answering

tvqa dataset pytorch video-question-answering

Created 2019-04-22

13 commits to master branch, last one 2 years ago

EMCL jpthu17

9

125

mit

3

[NeurIPS 2022 Spotlight] Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations

neurips video-retrieval video-captioning cross-modal-retrieval video-question-answering

Created 2022-09-23

33 commits to main branch, last one 8 months ago

just-ask antoyang

15

118

apache-2.0

5

[ICCV 2021 Oral + TPAMI] Just Ask: Learning to Answer Questions from Millions of Narrated Videos

vqa videoqa pre-training multimodal-learning question-generation video-understanding vision-and-language video-question-answering visual-question-answering weakly-supervised-learning

Created 2021-03-22

47 commits to main branch, last one about a year ago

HBI jpthu17

5

110

apache-2.0

4

[CVPR 2023 Highlight & TPAMI] Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning

cvpr video-retrieval cross-modal-retrieval video-question-answering

Created 2023-02-28

35 commits to main branch, last one 2 days ago

Shot2Story bytedance

6

104

unknown

6

A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.

dataset research benchmark video-story video-language vision-language video-captioning video-summarization large-language-models video-story-generation video-question-answering video-language-pretraining

Created 2023-12-16

48 commits to master branch, last one 3 months ago

Flipped-VQA mlvlab

9

74

mit

5

Large Language Models are Temporal and Causal Reasoners for Video Question Answering (EMNLP 2023)

emnlp2023 multi-modal large-language-models video-question-answering visual-question-answering

Created 2023-10-19

20 commits to main branch, last one 5 months ago

NExT-GQA doc-doc

1

62

mit

1

Can I Trust Your Answer? Visually Grounded Video Question Answering (CVPR'24, Highlight)

videoqa trustworthy-vqa video-grounding video-question-answering visual-evidence-grounding video-language-understanding

Created 2023-08-28

54 commits to main branch, last one 6 months ago

FreeVA whwu95

0

54

apache-2.0

2

FreeVA: Offline MLLM as Training-Free Video Assistant

llava chatbot chatgpt training-free video-understanding vision-language-model video-question-answering zero-shot-video-captioning multimodal-large-language-models

Created 2024-05-13

16 commits to main branch, last one 6 months ago

Causal-VidQA bcmi

4

52

mit

10

[CVPR 2022] A large-scale public benchmark dataset for video question-answering, especially about evidence and commonsense reasoning. The code used in our paper "From Representation to Reasoning: Towa...

evidence-reason visual-understanding commonsense-reasoning video-question-answering video-question-answering-dataset

Created 2022-05-31

6 commits to master branch, last one 5 months ago

VGT sail-sg

12

46

apache-2.0

4

Video Graph Transformer for Video Question Answering (ECCV'22)

videoqa graph-transformer temporal-dynamics video-question-answering video-language-understanding

Created 2022-07-20

26 commits to main branch, last one about a year ago

PKOL zchoi

0

46

mit

2

[TIP 2022] Official code of paper “Video Question Answering with Prior Knowledge and Object-sensitive Learning”

pytorch video-retrieval vision-language pytorch-implementation video-question-answering

Created 2022-01-24

43 commits to main branch, last one 11 months ago

pytorch_empirical-mvm tsujuifu

2

39

unknown

2

A PyTorch implementation of EmpiricalMVM

pytorch cvpr2023 pre-training video-retrieval video-captioning vision-and-language video-question-answering

Created 2023-03-09

9 commits to main branch, last one about a year ago

Tem-adapter XLiu443

2

35

unknown

2

[ICCV2023] Tem-adapter: Adapting Image-Text Pretraining for Video Question Answer

clip-model video-understanding video-question-answering

Created 2023-07-22

7 commits to main branch, last one about a year ago

MELTR mlvlab

7

32

mit

7

MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models (CVPR 2023)

cvpr2023 multi-modal meta-learning video-retrieval video-captioning video-question-answering

Created 2023-03-23

9 commits to master branch, last one 8 months ago