28 results found Sort:
- Filter by Primary Language:
- Python (24)
- Kotlin (1)
- MATLAB (1)
- +
🏄 Scalable embedding, reasoning, ranking for images and sentences with CLIP
Created
2018-11-12
1,960 commits to main branch, last one about a year ago
X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense r...
Created
2021-06-25
84 commits to master branch, last one about a year ago
The Paper List of Large Multi-Modality Model (Perception, Generation, Unification), Parameter-Efficient Finetuning, Vision-Language Pretraining, Conventional Image-Text Matching for Preliminary Insigh...
tutorial
awesome-list
image-text-matching
large-vision-models
vision-and-language
image-text-retrieval
large-language-model
video-text-retrieval
cross-modal-retrieval
large-language-models
multimodal-pretraining
video-text-recognition
memory-efficient-tuning
text-to-image-synthesis
text-to-image-generation
text-to-video-generation
visual-semantic-embedding
large-vision-language-models
parameter-efficient-fine-tuning
multimodal-large-language-models
Created
2020-12-22
130 commits to main branch, last one 15 days ago
Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey
awsome
survey
surveys
paper-list
awsome-list
entity-linking
knowledge-graph
entity-alignment
image-generation
multi-modal-fusion
image-classification
multi-modal-learning
cross-modal-retrieval
large-language-models
information-extraction
visual-question-answering
knowledge-graph-embeddings
multi-modal-knowledge-graph
Created
2024-01-29
83 commits to main branch, last one 20 days ago
TOMM2020 Dual-Path Convolutional Image-Text Embedding :feet: https://arxiv.org/abs/1711.05535
Created
2017-11-17
146 commits to master branch, last one about a year ago
Offline semantic Text-to-Image and Image-to-Image search on Android powered by quantized state-of-the-art vision-language pretrained CLIP model and ONNX Runtime inference engine
Created
2023-02-24
43 commits to main branch, last one about a year ago
[AAAI2021] The code of “Similarity Reasoning and Filtration for Image-Text Matching”
Created
2020-12-16
45 commits to main branch, last one 8 months ago
Code for "Learning the Best Pooling Strategy for Visual Semantic Embedding", CVPR 2021 (Oral)
Created
2021-01-10
72 commits to master branch, last one about a year ago
Polysemous Visual-Semantic Embedding for Cross-Modal Retrieval (CVPR 2019)
Created
2019-06-11
84 commits to master branch, last one 10 months ago
Official Pytorch implementation of "Probabilistic Cross-Modal Embedding" (CVPR 2021)
Created
2021-06-15
8 commits to main branch, last one 10 months ago
[ICCV 2023] DiffusionRet: Generative Text-Video Retrieval with Diffusion Model
Created
2023-03-16
20 commits to main branch, last one 8 months ago
[NeurIPS 2022 Spotlight] Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations
Created
2022-09-23
33 commits to main branch, last one 8 months ago
PyTorch code for BagFormer: Better Cross-Modal Retrieval via bag-wise interaction
Created
2022-05-24
35 commits to main branch, last one about a year ago
[CVPR 2023 Highlight & TPAMI] Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning
Created
2023-02-28
35 commits to main branch, last one 2 days ago
Official implementation of "Contrastive Audio-Language Learning for Music" (ISMIR 2022)
Created
2022-08-16
7 commits to main branch, last one 25 days ago
[CVPR 2020, Oral] "Sketch Less for More: On-the-Fly Fine-Grained Sketch Based Image Retrieval”, IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2020. .
Created
2019-12-16
33 commits to master branch, last one 3 years ago
Extended COCO Validation (ECCV) Caption dataset (ECCV 2022)
Created
2022-03-30
4 commits to main branch, last one 10 months ago
Official Pytorch implementation of "Improved Probabilistic Image-Text Representations" (ICLR 2024)
Created
2023-05-29
8 commits to main branch, last one 7 months ago
Unsupervised Contrastive Cross-modal Hashing (IEEE TPAMI 2023, PyTorch Code)
Created
2022-05-20
55 commits to main branch, last one about a year ago
Learning Cross-Modal Retrieval with Noisy Labels (CVPR 2021, PyTorch Code)
Created
2021-04-06
103 commits to main branch, last one 2 years ago
[IJCAI 2023] Text-Video Retrieval with Disentangled Conceptualization and Set-to-Set Alignment
Created
2023-04-29
15 commits to main branch, last one 8 months ago
[CVPR 2024] Do you remember? Dense Video Captioning with Cross-Modal Memory Retrieval
Created
2023-09-06
23 commits to master branch, last one 6 months ago
[BMVC 2021] Text-Based Person Search with Limited Data
Created
2021-02-15
14 commits to main branch, last one 2 years ago
Code, dataset and models for our CVPR 2022 publication "Text2Pos"
Created
2021-02-09
202 commits to master branch, last one 2 years ago
PyTorch implementation of the AAAI-21 paper "Dual Adversarial Label-aware Graph Neural Networks for Cross-modal Retrieval" and the TPAMI-22 paper "Integrating Multi-Label Contrastive Learning with Dua...
Created
2021-09-22
18 commits to main branch, last one 2 years ago
[AAAI 2024] DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval.
Created
2024-02-14
28 commits to main branch, last one 2 months ago
[TIP2023] The code of “Plug-and-Play Regulators for Image-Text Matching”
Created
2023-03-23
15 commits to main branch, last one 8 months ago
Code implementation of paper "SEMScene: Semantic-Consistency Enhanced Multi-Level Scene Graph Matching for Image-Text Retrieval".
Created
2023-11-26
51 commits to main branch, last one about a month ago