36 results found Sort:

984
10.3k
other
79
An open source implementation of CLIP.
Created 2021-07-28
566 commits to main branch, last one 21 days ago
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
Created 2022-07-08
382 commits to master branch, last one 3 months ago
127
1.6k
apache-2.0
33
Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration
Created 2023-05-23
142 commits to main branch, last one 7 months ago
75
1.3k
other
16
The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".
Created 2023-03-02
36 commits to main branch, last one 10 months ago
A concise but complete implementation of CLIP with various experimental improvements from recent papers
Created 2021-12-01
76 commits to main branch, last one about a year ago
A curated list of Visual Question Answering(VQA)(Image/Video Question Answering),Visual Question Generation ,Visual Dialog ,Visual Commonsense Reasoning and related area.
Created 2019-03-03
36 commits to master branch, last one about a year ago
37
490
apache-2.0
7
[CVPR 2024 & NeurIPS 2024] EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI
Created 2023-12-11
105 commits to main branch, last one 2 months ago
42
423
apache-2.0
5
Build high-performance AI models with modular building blocks
Created 2023-07-09
882 commits to master branch, last one 19 days ago
CVPR 2023-2024 Papers: Dive into advanced research presented at the leading computer vision conference. Keep up to date with the latest developments in computer vision and deep learning. Code included...
Created 2023-08-09
1,146 commits to main branch, last one 4 months ago
3
237
apache-2.0
4
[CVPR 2024] Official PyTorch Code for "PromptKD: Unsupervised Prompt Distillation for Vision-Language Models"
Created 2024-02-27
40 commits to main branch, last one about a month ago
15
222
unknown
5
[ICCV 2023] Implicit Neural Representation for Cooperative Low-light Image Enhancement
Created 2023-03-20
38 commits to main branch, last one 8 months ago
10
151
apache-2.0
11
Minimal sharded dataset loaders, decoders, and utils for multi-modal document, image, and text datasets.
Created 2023-06-06
36 commits to main branch, last one 7 months ago
9
145
unknown
5
The official repository of Achelous and Achelous++
Created 2023-03-17
150 commits to main branch, last one 4 months ago
13
129
mit
8
[ICML 2023] Contrast with Reconstruct: Contrastive 3D Representation Learning Guided by Generative Pretraining
Created 2023-02-07
13 commits to main branch, last one 4 months ago
11
117
other
5
Official pytorch repository for CG-DETR "Correlation-guided Query-Dependency Calibration in Video Representation Learning for Temporal Grounding"
Created 2023-11-10
18 commits to main branch, last one 3 months ago
7
108
other
8
A detection/segmentation dataset with labels characterized by intricate and flexible expressions. "Described Object Detection: Liberating Object Detection with Flexible Expressions" (NeurIPS 2023).
Created 2023-07-19
13 commits to main branch, last one 8 months ago
【CVPR2024】Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification
Created 2024-03-14
124 commits to master branch, last one 28 days ago
7
83
unknown
5
[NeurIPS2023] Parameter-efficient Tuning of Large-scale Multimodal Foundation Model
Created 2023-05-14
20 commits to main branch, last one 11 months ago
A python tool to perform deep learning experiments on multimodal remote sensing data.
Created 2021-02-28
26 commits to main branch, last one 2 years ago
An official implementation of Advancing Radiograph Representation Learning with Masked Record Modeling (ICLR'23)
Created 2023-01-18
14 commits to main branch, last one about a year ago
[NeurIPS 2023] A faithful benchmark for vision-language compositionality
Created 2023-06-04
54 commits to main branch, last one 9 months ago
8
68
apache-2.0
8
Japanese CLIP by rinna Co., Ltd.
Created 2022-04-25
31 commits to master branch, last one 2 years ago
[ICCV 2021] Official implementation of the paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering"
Created 2021-07-23
61 commits to main branch, last one 3 years ago
Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline (CVPR 2023)
Created 2023-04-05
13 commits to main branch, last one 9 months ago
4
59
apache-2.0
3
Enhancing Large Vision Language Models with Self-Training on Image Comprehension.
Created 2024-05-27
17 commits to main branch, last one 5 months ago
A curated list of vision-and-language pre-training (VLP). :-)
Created 2021-10-30
19 commits to main branch, last one 2 years ago
3
54
bsd-2-clause
2
Code for the IEEE Signal Processing Letters 2022 paper "UAVM: Towards Unifying Audio and Visual Models".
Created 2023-02-16
14 commits to main branch, last one about a year ago
This repository contains code to download data for the preprint "MMEarth: Exploring Multi-Modal Pretext Tasks For Geospatial Representation Learning"
Created 2024-03-25
46 commits to main branch, last one 14 days ago
[ICLR 2024 Spotlight] This is the official code for the paper "SNIP: Bridging Mathematical Symbolic and Numeric Realms with Unified Pre-training"
Created 2024-01-07
33 commits to main branch, last one 29 days ago