12 results found Sort:
An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"
Created
2021-04-13
29 commits to master branch, last one 2 years ago
The Paper List of Large Multi-Modality Model (Perception, Generation, Unification), Parameter-Efficient Finetuning, Vision-Language Pretraining, Conventional Image-Text Matching for Preliminary Insigh...
tutorial
awesome-list
image-text-matching
large-vision-models
vision-and-language
image-text-retrieval
large-language-model
video-text-retrieval
cross-modal-retrieval
large-language-models
multimodal-pretraining
video-text-recognition
memory-efficient-tuning
text-to-image-synthesis
text-to-image-generation
text-to-video-generation
visual-semantic-embedding
large-vision-language-models
parameter-efficient-fine-tuning
multimodal-large-language-models
Created
2020-12-22
130 commits to main branch, last one 6 days ago
An official implementation for " UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation"
Created
2020-10-30
20 commits to main branch, last one 2 years ago
【CVPR'2023 Highlight & TPAMI】Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?
Created
2023-01-07
32 commits to main branch, last one 22 days ago
Align and Prompt: Video-and-Language Pre-training with Entity Prompts
Created
2021-12-11
14 commits to main branch, last one 2 years ago
Story-Based Retrieval with Contextual Embeddings. Largest freely available movie video dataset. [ACCV'20]
Created
2020-04-21
34 commits to master branch, last one 2 years ago
An official implementation for "X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text Retrieval"
Created
2022-09-19
4 commits to main branch, last one 2 years ago
Research Code for Multimodal-Cognition Team in Ant Group
Created
2023-08-21
142 commits to main branch, last one 5 months ago
CrossCLR: Cross-modal Contrastive Learning For Multi-modal Video Representations, ICCV 2021
Created
2021-10-12
4 commits to main branch, last one 3 years ago
[arXiv] Cross-Modal Adapter for Text-Video Retrieval
Created
2022-11-16
15 commits to main branch, last one 2 years ago
[EMNLP 2023] TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding
Created
2023-10-29
9 commits to main branch, last one 11 months ago
[AAAI 2024] DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval.
Created
2024-02-14
28 commits to main branch, last one 2 months ago