2 results found Sort:

53
341
mit
10
An official implementation for " UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation"
Created 2020-10-30
20 commits to main branch, last one 2 years ago
A curated list of video-text datasets in a variety of languages. These datasets can be used for video captioning (video description) or video retrieval.
Created 2023-01-03
26 commits to main branch, last one 10 months ago