33 results found Sort:

An open source implementation of CLIP.
Created 2021-07-28
534 commits to main branch, last one 3 days ago
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
Created 2022-07-08
374 commits to master branch, last one 7 months ago
114
1.5k
apache-2.0
33
Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration
Created 2023-05-23
142 commits to main branch, last one 2 months ago
🥂 Gracefully face hCaptcha challenge with MoE(ONNX) embedded solution.
Created 2022-02-15
885 commits to main branch, last one 6 months ago
75
1.3k
other
15
The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".
Created 2023-03-02
36 commits to main branch, last one 5 months ago
A concise but complete implementation of CLIP with various experimental improvements from recent papers
Created 2021-12-01
76 commits to main branch, last one 8 months ago
A curated list of Visual Question Answering(VQA)(Image/Video Question Answering),Visual Question Generation ,Visual Dialog ,Visual Commonsense Reasoning and related area.
Created 2019-03-03
36 commits to master branch, last one about a year ago
24
358
apache-2.0
7
[CVPR 2024] EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI
Created 2023-12-11
101 commits to main branch, last one 12 days ago
CVPR 2023-2024 Papers: Dive into advanced research presented at the leading computer vision conference. Keep up to date with the latest developments in computer vision and deep learning. Code included...
Created 2023-08-09
1,138 commits to main branch, last one 8 hours ago
26
310
apache-2.0
4
Build high-performance AI models with modular building blocks
Created 2023-07-09
791 commits to master branch, last one 2 days ago
14
211
unknown
5
[ICCV 2023] Implicit Neural Representation for Cooperative Low-light Image Enhancement
Created 2023-03-20
38 commits to main branch, last one 3 months ago
1
138
apache-2.0
3
[CVPR 2024] Official PyTorch Code for "PromptKD: Unsupervised Prompt Distillation for Vision-Language Models"
Created 2024-02-27
34 commits to main branch, last one 13 days ago
9
137
apache-2.0
11
Minimal sharded dataset loaders, decoders, and utils for multi-modal document, image, and text datasets.
Created 2023-06-06
36 commits to main branch, last one 2 months ago
7
133
unknown
4
Achelous: A Fast Unified Water-surface Panoptic Perception Framework based on Fusion of Monocular Camera and 4D mmWave Radar
Created 2023-03-17
149 commits to main branch, last one 2 months ago
12
113
mit
8
[ICML 2023] Contrast with Reconstruct: Contrastive 3D Representation Learning Guided by Generative Pretraining
Created 2023-02-07
11 commits to main branch, last one 3 months ago
6
99
other
7
A detection/segmentation dataset with labels characterized by intricate and flexible expressions. "Described Object Detection: Liberating Object Detection with Flexible Expressions" (NeurIPS 2023).
Created 2023-07-19
13 commits to main branch, last one 3 months ago
11
89
other
5
Official pytorch repository for CG-DETR "Correlation-guided Query-Dependency Calibration in Video Representation Learning for Temporal Grounding"
Created 2023-11-10
16 commits to main branch, last one 2 months ago
6
79
unknown
5
[NeurIPS2023] Parameter-efficient Tuning of Large-scale Multimodal Foundation Model
Created 2023-05-14
20 commits to main branch, last one 7 months ago
A python tool to perform deep learning experiments on multimodal remote sensing data.
Created 2021-02-28
26 commits to main branch, last one 2 years ago
An official implementation of Advancing Radiograph Representation Learning with Masked Record Modeling (ICLR'23)
Created 2023-01-18
14 commits to main branch, last one about a year ago
6
66
apache-2.0
8
Japanese CLIP by rinna Co., Ltd.
Created 2022-04-25
31 commits to master branch, last one about a year ago
[ICCV 2021] Official implementation of the paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering"
Created 2021-07-23
61 commits to main branch, last one 2 years ago
[NeurIPS 2023] A faithful benchmark for vision-language compositionality
Created 2023-06-04
54 commits to main branch, last one 4 months ago
【CVPR2024】Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification
Created 2024-03-14
121 commits to master branch, last one 13 days ago
A curated list of vision-and-language pre-training (VLP). :-)
Created 2021-10-30
19 commits to main branch, last one about a year ago
3
54
bsd-2-clause
2
Code for the IEEE Signal Processing Letters 2022 paper "UAVM: Towards Unifying Audio and Visual Models".
Created 2023-02-16
14 commits to main branch, last one about a year ago
Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline (CVPR 2023)
Created 2023-04-05
13 commits to main branch, last one 4 months ago
2
39
apache-2.0
3
Enhancing Large Vision Language Models with Self-Training on Image Comprehension.
Created 2024-05-27
17 commits to main branch, last one 27 days ago
4
37
unknown
2
MMEA: Entity Alignment for Multi-Modal Knowledge Graphs, KSEM 2020
Created 2021-03-22
8 commits to master branch, last one 2 years ago