Search Results - RepositoryStats

105

970

other

28

X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense r...

tden pretraining image-captioning video-captioning vision-and-language cross-modal-retrieval visual-question-answering

Created 2021-06-25

84 commits to master branch, last one 2 years ago

Video2Description scopeInfinity

69

343

apache-2.0

8

Video to Text: Natural language description generator for some given video. [Video Captioning]

cnn-keras video-to-text audio-processing image-captioning video-captioning video-processing deep-neural-networks lstm-neural-networks

Created 2017-10-25

260 commits to VideoCaption branch, last one 2 years ago

NAACL_2025_TWM xid32

30

308

unknown

21

We introduce temporal working memory (TWM), which aims to enhance the temporal modeling capabilities of Multimodal foundation models (MFMs). This plug-and-play module can be easily integrated into exi...

working-memory video-captioning question-answering video-text-retrieval audio-visual-learning multimodal-foundation-model multimodal-large-language-models

Created 2025-01-23

27 commits to main branch, last one about a month ago

whisper-auto-transcribe tomchang25

16

225

mit

5

Auto transcribe tool based on whisper

asr gradio pytorch deep-learning language-model speech-to-text text-to-speech gradio-interface video-captioning speech-processing speech-recognition voice-activity-detection

Created 2022-09-24

192 commits to main branch, last one about a year ago

VidChapters antoyang

21

189

mit

4

[NeurIPS 2023 D&B] VidChapters-7M: Video Chapters at Scale

vid2seq pre-training video-captioning multimodal-learning video-understanding vision-and-language dense-video-captioning video-chapter-generation weakly-supervised-learning temporal-language-grounding

Created 2023-06-06

59 commits to main branch, last one about a year ago

EMCL jpthu17

9

130

mit

2

[NeurIPS 2022 Spotlight] Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations

neurips video-retrieval video-captioning cross-modal-retrieval video-question-answering

Created 2022-09-23

33 commits to main branch, last one 11 months ago

Shot2Story bytedance

7

124

unknown

5

A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.

dataset research benchmark video-story video-language vision-language video-captioning video-summarization large-language-models video-story-generation video-question-answering video-language-pretraining

Created 2023-12-16

49 commits to master branch, last one about a month ago

video_captioning_datasets jssprz

12

121

unknown

2

Summary about Video-to-Text datasets. This repository is part of the review paper *Bridging Vision and Language from the Video-to-Text Perspective: A Comprehensive Review*

msvd vatex review msr-vtt trecvid charades tgif-dataset video-dataset video-to-text state-of-the-art video-captioning video-description vision-and-language activitynet-captions

Created 2021-03-12

24 commits to main branch, last one 2 years ago

Awesome-Captioning terry-r123

10

110

unknown

4

A curated list of Multimodal Captioning related research(including image captioning, video captioning, and text captioning)

text-captioning image-captioning video-captioning

Created 2021-06-13

50 commits to main branch, last one 2 years ago

TVCaption jayleicn

11

90

mit

5

[ECCV 2020] PyTorch code of MMT (a multimodal transformer captioning model) on TVCaption dataset

dataset pytorch video-captioning

Created 2020-01-27

6 commits to master branch, last one 2 years ago

Video-Captioning-Transformer Kamino666

17

86

apache-2.0

1

这是一个基于Pytorch平台、Transformer框架实现的视频描述生成 (Video Captioning) 深度学习模型。视频描述生成任务指的是：输入一个视频，输出一句描述整个视频内容的文字（前提是视频较短且可以用一句话来描述）。本repo主要目的是帮助视力障碍者欣赏网络视频、感知周围环境，促进“无障碍视频”的发展。

pytorch transformer video-captioning

Created 2021-09-12

115 commits to master branch, last one 3 years ago

video-captioning-models-in-Pytorch nasib-ullah

17

70

unknown

2

A PyTorch implementation of state of the art video captioning models from 2015-2019 on MSVD and MSRVTT datasets.

marn msvd s2vt video msrvtt recnet pytorch deep-learning video-captioning sequence-to-sequence pytorch-implementation video-captioning-models

Created 2021-01-28

133 commits to main branch, last one about a year ago

MTL-AQA ParitoshParmar

15

66

unknown

3

What and How Well You Performed? A Multitask Learning Approach to Action Quality Assessment [CVPR 2019]

c3d lstm mtl-aqa pytorch captioning dilated-c3d video-captioning video-processing action-recognition multitask-learning dilated-convolution video-understanding representation-learning action-quality-assessment fine-grained-classification fine-grained-action-recognition

Created 2019-06-11

26 commits to master branch, last one 4 months ago

VLTinT UARK-AICV

6

66

unknown

3

[AAAI 2023 Oral] VLTinT: Visual-Linguistic Transformer-in-Transformer for Coherent Video Paragraph Captioning

pytorch aaai2023 vision-language video-captioning transformer-architecture video-paragraph-captioning

Created 2022-06-23

18 commits to main branch, last one about a year ago

crossmodal-contrastive-learning amazon-science

11

62

apache-2.0

4

CrossCLR: Cross-modal Contrastive Learning For Multi-modal Video Representations, ICCV 2021

video transformers multi-modality computer-vision video-captioning contrastive-learning video-text-retrieval natural-language-processing

Created 2021-10-12

4 commits to main branch, last one 3 years ago

Video2Commonsense jacobswan1

12

56

unknown

3

Video captioning baseline models on Video2Commonsense Dataset.

video-captioning commonsense-story video2commonsense commonsense-question-answering

Created 2020-02-05

5 commits to master branch, last one 3 years ago

srt-webvtt imshaikot

6

51

mit

3

Convert SRT formatted subtitle to WebVTT on the fly over HTML5/browser environment

html5 video web-vtt converter html5-video srt-subtitles video-captioning

Created 2018-01-19

41 commits to master branch, last one about a year ago

COSA TXH-mercury

3

43

mit

2

[ICLR2024] Codes and Models for COSA: Concatenated Sample Pretrained Vision-Language Foundation Model

video-qa video-retrieval video-captioning video-language-pretrainng vision-language-pretraining

Created 2023-05-24

9 commits to master branch, last one 3 months ago

pytorch_empirical-mvm tsujuifu

2

40

unknown

2

A PyTorch implementation of EmpiricalMVM

pytorch cvpr2023 pre-training video-retrieval video-captioning vision-and-language video-question-answering

Created 2023-03-09

9 commits to main branch, last one about a year ago

CoCap acherstyx

4

39

mit

2

[ICCV 2023] Accurate and Fast Compressed Video Captioning

iccv2023 compressed-video video-captioning

Created 2023-07-21

5 commits to main branch, last one about a year ago

awesome-video-text-datasets willyfh

3

36

mit

2

A curated list of video-text datasets in a variety of languages. These datasets can be used for video captioning (video description) or video retrieval.

dataset video-text video-to-text video-language video-retrieval vision-language video-captioning video-description

Created 2023-01-03

26 commits to main branch, last one about a year ago

MELTR mlvlab

7

33

mit

7

MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models (CVPR 2023)

cvpr2023 multi-modal meta-learning video-retrieval video-captioning video-question-answering

Created 2023-03-23

9 commits to master branch, last one 11 months ago

LLMVA-GEBC zjr2000

2

30

bsd-3-clause

1

Winner solution to Generic Event Boundary Captioning task in LOVEU Challenge (CVPR 2023 workshop)

video-captioning pytorch-implementation long-video-understanding

Created 2023-05-24

111 commits to main branch, last one about a year ago