11 results found Sort:

117
814
mit
12
An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"
Created 2021-04-13
29 commits to master branch, last one 2 years ago
The Paper List of Large Multi-Modality Model, Parameter-Efficient Finetuning, Vision-Language Pretraining, Conventional Image-Text Matching for Preliminary Insight.
Created 2020-12-22
128 commits to main branch, last one 22 days ago
54
332
mit
10
An official implementation for " UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation"
Created 2020-10-30
20 commits to main branch, last one about a year ago
【CVPR'2023 Highlight & TPAMI】Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?
Created 2023-01-07
27 commits to main branch, last one about a month ago
18
184
bsd-3-clause
7
Align and Prompt: Video-and-Language Pre-training with Entity Prompts
Created 2021-12-11
14 commits to main branch, last one about a year ago
27
154
unknown
10
Story-Based Retrieval with Contextual Embeddings. Largest freely available movie video dataset. [ACCV'20]
Created 2020-04-21
34 commits to master branch, last one about a year ago
15
119
mit
2
An official implementation for "X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text Retrieval"
Created 2022-09-19
4 commits to main branch, last one about a year ago
CrossCLR: Cross-modal Contrastive Learning For Multi-modal Video Representations, ICCV 2021
Created 2021-10-12
4 commits to main branch, last one 2 years ago
Research Code for Multimodal-Cognition Team in Ant Group
Created 2023-08-21
134 commits to main branch, last one 16 days ago
[arXiv] Cross-Modal Adapter for Text-Video Retrieval
Created 2022-11-16
15 commits to main branch, last one about a year ago
[EMNLP 2023] TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding
Created 2023-10-29
9 commits to main branch, last one 5 months ago