Search Results - RepositoryStats

VLog showlab

29

567

unknown

8

[CVPR 2025] Video Narration as Vocabulary & Video as Long Document

chatgpt whisper langchain vocabulary video-language large-language-model

Created 2023-04-20

25 commits to main branch, last one about a month ago

UniVL microsoft

56

351

mit

9

An official implementation for " UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation"

coin joint video msrvtt caption pretrain alignment youcookii video-text pretraining caption-task localization segmentation multimodality retrieval-task video-language video-text-retrieval multimodal-sentiment-analysis

Created 2020-10-30

20 commits to main branch, last one 2 years ago

UniVTG showlab

32

347

mit

6

[ICCV 2023] UniVTG: Towards Unified Video-Language Temporal Grounding

pretraining video-language video-grounding moment-retrieval highlight-detection video-summarization

Created 2023-05-27

134 commits to main branch, last one 11 months ago

ReferFormer wjn922

25

339

apache-2.0

7

[CVPR2022] Official Implementation of ReferFormer

video-language referring-video-object-segmentation

Created 2022-01-02

63 commits to main branch, last one 2 months ago

all-in-one showlab

18

281

unknown

6

[CVPR2023] All in One: Exploring Unified Video-Language Pre-training

pytorch codebase pre-training video-language

Created 2022-03-14

29 commits to main branch, last one 2 years ago

EgoVLP showlab

20

238

unknown

3

[NeurIPS 2022] Egocentric Video-Language Pretraining

pytorch pretraining video-language egocentric-vision

Created 2022-05-31

48 commits to main branch, last one 11 months ago

Multi-Modal-Transformer junchen14

31

225

unknown

7

The repository collects many various multi-modal transformer architectures, including image transformer, video transformer, image-language transformer, video-language transformer and self-supervised l...

language mlp-mixer multi-modal video-language image-transformer video-transformer vision-transformer multi-modal-cvpr2021 efficiency-transformer transformer-readling-list

Created 2021-04-07

418 commits to main branch, last one 2 years ago

ALPRO salesforce

17

186

bsd-3-clause

6

Align and Prompt: Video-and-Language Pre-training with Entity Prompts

video-language prompt-learning vision-and-language video-text-retrieval representation-learning video-question-answering

Created 2021-12-11

14 commits to main branch, last one 2 years ago

Shot2Story bytedance

6

129

unknown

5

A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.

dataset research benchmark video-story video-language vision-language video-captioning video-summarization large-language-models video-story-generation video-question-answering video-language-pretraining

Created 2023-12-16

49 commits to master branch, last one 2 months ago

VidIL MikeWangWZHL

1

115

mit

4

Pytorch code for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners

blip clip msvd vlep gpt-3 vatex msrvtt youcook2 video-language vision-language

Created 2022-05-20

17 commits to main branch, last one 2 years ago

VidSitu TheShadow29

8

59

mit

2

[CVPR21] Visual Semantic Role Labeling for Video Understanding (https://arxiv.org/abs/2104.00990)

nlp srl video vision grounding captioning semantic-roles video-language event-relations captioning-videos vision-and-language

Created 2021-02-10

57 commits to main branch, last one 3 years ago

Awesome-Video-Language-Understanding liveseongho

2

48

mit

1

A Survey on video and language understanding.

paper dataset deep-learning awesome-papers video-language machine-learning multimodal-deep-learning video-language-pretraining video-language-understanding

Created 2023-04-14

23 commits to main branch, last one about a year ago

Region_Learner showlab

2

42

unknown

5

The Pytorch implementation for "Video-Text Pre-training with Learned Regions"

video-language

Created 2021-11-21

18 commits to main branch, last one 2 years ago

awesome-video-text-datasets willyfh

3

36

mit

2

A curated list of video-text datasets in a variety of languages. These datasets can be used for video captioning (video description) or video retrieval.

dataset video-text video-to-text video-language video-retrieval vision-language video-captioning video-description

Created 2023-01-03

26 commits to main branch, last one about a year ago

VideoGUI showlab

2

33

unknown

4

[NeurIPS 2024 D&B] VideoGUI: A Benchmark for GUI Automation from Instructional Videos

gui llm-agent video-language

Created 2024-06-16

13 commits to main branch, last one 8 days ago

Perceiver_VL zinengtang

4

33

mit

2

PyTorch code for "Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative Latent Attention" (WACV 2023)

retrieval efficiency scalability video-language vision-and-language

Created 2022-03-14

123 commits to main branch, last one 2 years ago

VideoTGB bigai-nlco

2

29

mit

2

[EMNLP 2024] A Video Chat Agent with Temporal Prior

llm mllm video-language spatial-temporal visual-instruction-tuning multimodal-large-language-models

Created 2024-02-25

18 commits to main branch, last one about a month ago