36 results found Sort:

1.0k
10.6k
other
81
An open source implementation of CLIP.
Created 2021-07-28
572 commits to main branch, last one 16 days ago
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
Created 2022-07-08
382 commits to master branch, last one 4 months ago
128
1.6k
apache-2.0
33
Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration
Created 2023-05-23
142 commits to main branch, last one 8 months ago
74
1.3k
other
16
The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".
Created 2023-03-02
36 commits to main branch, last one 11 months ago
A concise but complete implementation of CLIP with various experimental improvements from recent papers
Created 2021-12-01
76 commits to main branch, last one about a year ago
A curated list of Visual Question Answering(VQA)(Image/Video Question Answering),Visual Question Generation ,Visual Dialog ,Visual Commonsense Reasoning and related area.
Created 2019-03-03
36 commits to master branch, last one about a year ago
37
518
apache-2.0
7
[CVPR 2024 & NeurIPS 2024] EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI
Created 2023-12-11
105 commits to main branch, last one 3 months ago
44
447
apache-2.0
6
Build high-performance AI models with modular building blocks
Created 2023-07-09
886 commits to master branch, last one 5 days ago
CVPR 2023-2024 Papers: Dive into advanced research presented at the leading computer vision conference. Keep up to date with the latest developments in computer vision and deep learning. Code included...
Created 2023-08-09
1,146 commits to main branch, last one 5 months ago
3
251
apache-2.0
4
[CVPR 2024] Official PyTorch Code for "PromptKD: Unsupervised Prompt Distillation for Vision-Language Models"
Created 2024-02-27
41 commits to main branch, last one 24 days ago
15
222
unknown
5
[ICCV 2023] Implicit Neural Representation for Cooperative Low-light Image Enhancement
Created 2023-03-20
38 commits to main branch, last one 9 months ago
11
153
apache-2.0
11
Minimal sharded dataset loaders, decoders, and utils for multi-modal document, image, and text datasets.
Created 2023-06-06
36 commits to main branch, last one 8 months ago
9
146
unknown
5
The official repository of Achelous and Achelous++
Created 2023-03-17
150 commits to main branch, last one 5 months ago
13
131
mit
8
[ICML 2023] Contrast with Reconstruct: Contrastive 3D Representation Learning Guided by Generative Pretraining
Created 2023-02-07
13 commits to main branch, last one 5 months ago
12
121
other
5
Official pytorch repository for CG-DETR "Correlation-guided Query-Dependency Calibration in Video Representation Learning for Temporal Grounding"
Created 2023-11-10
18 commits to main branch, last one 4 months ago
7
108
other
7
A detection/segmentation dataset with labels characterized by intricate and flexible expressions. "Described Object Detection: Liberating Object Detection with Flexible Expressions" (NeurIPS 2023).
Created 2023-07-19
13 commits to main branch, last one 9 months ago
【CVPR2024】Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification
Created 2024-03-14
124 commits to master branch, last one about a month ago
A python tool to perform deep learning experiments on multimodal remote sensing data.
Created 2021-02-28
26 commits to main branch, last one 2 years ago
7
83
unknown
5
[NeurIPS2023] Parameter-efficient Tuning of Large-scale Multimodal Foundation Model
Created 2023-05-14
20 commits to main branch, last one about a year ago
An official implementation of Advancing Radiograph Representation Learning with Masked Record Modeling (ICLR'23)
Created 2023-01-18
14 commits to main branch, last one about a year ago
[NeurIPS 2023] A faithful benchmark for vision-language compositionality
Created 2023-06-04
54 commits to main branch, last one 10 months ago
8
71
apache-2.0
8
Japanese CLIP by rinna Co., Ltd.
Created 2022-04-25
31 commits to master branch, last one 2 years ago
[ICCV 2021] Official implementation of the paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering"
Created 2021-07-23
61 commits to main branch, last one 3 years ago
4
61
apache-2.0
3
Enhancing Large Vision Language Models with Self-Training on Image Comprehension.
Created 2024-05-27
17 commits to main branch, last one 6 months ago
Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline (CVPR 2023)
Created 2023-04-05
13 commits to main branch, last one 10 months ago
A curated list of vision-and-language pre-training (VLP). :-)
Created 2021-10-30
19 commits to main branch, last one 2 years ago
This repository contains code to download data for the preprint "MMEarth: Exploring Multi-Modal Pretext Tasks For Geospatial Representation Learning"
Created 2024-03-25
46 commits to main branch, last one about a month ago
3
54
bsd-2-clause
2
Code for the IEEE Signal Processing Letters 2022 paper "UAVM: Towards Unifying Audio and Visual Models".
Created 2023-02-16
14 commits to main branch, last one about a year ago
[ICLR 2024 Spotlight] This is the official code for the paper "SNIP: Bridging Mathematical Symbolic and Numeric Realms with Unified Pre-training"
Created 2024-01-07
33 commits to main branch, last one 2 months ago