21 results found Sort:

111
1.0k
other
35
X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense r...
Created 2021-06-25
84 commits to master branch, last one about a year ago
Video to Text: Natural language description generator for some given video. [Video Captioning]
Created 2017-10-25
260 commits to VideoCaption branch, last one 2 years ago
Auto transcribe tool based on whisper
Created 2022-09-24
192 commits to main branch, last one about a year ago
[NeurIPS 2023 D&B] VidChapters-7M: Video Chapters at Scale
Created 2023-06-06
59 commits to main branch, last one 11 months ago
9
120
mit
3
[NeurIPS 2022 Spotlight] Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations
Created 2022-09-23
33 commits to main branch, last one 7 months ago
Summary about Video-to-Text datasets. This repository is part of the review paper *Bridging Vision and Language from the Video-to-Text Perspective: A Comprehensive Review*
Created 2021-03-12
24 commits to main branch, last one 2 years ago
A curated list of Multimodal Captioning related research(including image captioning, video captioning, and text captioning)
Created 2021-06-13
50 commits to main branch, last one 2 years ago
A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.
Created 2023-12-16
48 commits to master branch, last one about a month ago
[ECCV 2020] PyTorch code of MMT (a multimodal transformer captioning model) on TVCaption dataset
Created 2020-01-27
6 commits to master branch, last one 2 years ago
这是一个基于Pytorch平台、Transformer框架实现的视频描述生成 (Video Captioning) 深度学习模型。 视频描述生成任务指的是:输入一个视频,输出一句描述整个视频内容的文字(前提是视频较短且可以用一句话来描述)。本repo主要目的是帮助视力障碍者欣赏网络视频、感知周围环境,促进“无障碍视频”的发展。
Created 2021-09-12
115 commits to master branch, last one 2 years ago
A PyTorch implementation of state of the art video captioning models from 2015-2019 on MSVD and MSRVTT datasets.
Created 2021-01-28
133 commits to main branch, last one about a year ago
6
64
unknown
4
[AAAI 2023 Oral] VLTinT: Visual-Linguistic Transformer-in-Transformer for Coherent Video Paragraph Captioning
Created 2022-06-23
18 commits to main branch, last one 8 months ago
What and How Well You Performed? A Multitask Learning Approach to Action Quality Assessment [CVPR 2019]
Created 2019-06-11
25 commits to master branch, last one 11 days ago
Video captioning baseline models on Video2Commonsense Dataset.
Created 2020-02-05
5 commits to master branch, last one 3 years ago
CrossCLR: Cross-modal Contrastive Learning For Multi-modal Video Representations, ICCV 2021
Created 2021-10-12
4 commits to main branch, last one 2 years ago
A PyTorch implementation of EmpiricalMVM
Created 2023-03-09
9 commits to main branch, last one 10 months ago
Codes and Models for COSA: Concatenated Sample Pretrained Vision-Language Foundation Model
Created 2023-05-24
8 commits to master branch, last one about a year ago
[ICCV 2023] Accurate and Fast Compressed Video Captioning
Created 2023-07-21
5 commits to main branch, last one about a year ago
6
32
mit
7
MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models (CVPR 2023)
Created 2023-03-23
9 commits to master branch, last one 6 months ago
A curated list of video-text datasets in a variety of languages. These datasets can be used for video captioning (video description) or video retrieval.
Created 2023-01-03
26 commits to main branch, last one 8 months ago
2
29
bsd-3-clause
2
Winner solution to Generic Event Boundary Captioning task in LOVEU Challenge (CVPR 2023 workshop)
Created 2023-05-24
111 commits to main branch, last one 10 months ago