33 results found Sort:
- Filter by Primary Language:
- Python (29)
- Jupyter Notebook (1)
- +
An open source implementation of CLIP.
Created
2021-07-28
534 commits to main branch, last one 3 days ago
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
Created
2022-07-08
374 commits to master branch, last one 7 months ago
Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration
Created
2023-05-23
142 commits to main branch, last one 2 months ago
🥂 Gracefully face hCaptcha challenge with MoE(ONNX) embedded solution.
Created
2022-02-15
885 commits to main branch, last one 6 months ago
The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".
Created
2023-03-02
36 commits to main branch, last one 5 months ago
A concise but complete implementation of CLIP with various experimental improvements from recent papers
Created
2021-12-01
76 commits to main branch, last one 8 months ago
A curated list of Visual Question Answering(VQA)(Image/Video Question Answering),Visual Question Generation ,Visual Dialog ,Visual Commonsense Reasoning and related area.
Created
2019-03-03
36 commits to master branch, last one about a year ago
[CVPR 2024] EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI
Created
2023-12-11
101 commits to main branch, last one 12 days ago
CVPR 2023-2024 Papers: Dive into advanced research presented at the leading computer vision conference. Keep up to date with the latest developments in computer vision and deep learning. Code included...
cvpr
cvpr2023
cvpr2024
datasets
biometrics
segmentation
deep-learning
scene-analysis
shape-analysis
computer-vision
image-synthesis
video-synthesis
face-recognition
action-recognition
autonomous-driving
gesture-recognition
pattern-recognition
multi-modal-learning
medical-image-processing
self-supervised-learning
Created
2023-08-09
1,138 commits to main branch, last one 8 hours ago
Build high-performance AI models with modular building blocks
Created
2023-07-09
791 commits to master branch, last one 2 days ago
Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey
awsome
survey
surveys
paper-list
awsome-list
entity-linking
knowledge-graph
entity-alignment
image-generation
multi-modal-fusion
image-classification
multi-modal-learning
cross-modal-retrieval
large-language-models
information-extraction
visual-question-answering
knowledge-graph-embeddings
multi-modal-knowledge-graph
Created
2024-01-29
77 commits to main branch, last one about a month ago
[ICCV 2023] Implicit Neural Representation for Cooperative Low-light Image Enhancement
Created
2023-03-20
38 commits to main branch, last one 3 months ago
[CVPR 2024] Official PyTorch Code for "PromptKD: Unsupervised Prompt Distillation for Vision-Language Models"
Created
2024-02-27
34 commits to main branch, last one 13 days ago
Minimal sharded dataset loaders, decoders, and utils for multi-modal document, image, and text datasets.
Created
2023-06-06
36 commits to main branch, last one 2 months ago
Achelous: A Fast Unified Water-surface Panoptic Perception Framework based on Fusion of Monocular Camera and 4D mmWave Radar
Created
2023-03-17
149 commits to main branch, last one 2 months ago
[ICML 2023] Contrast with Reconstruct: Contrastive 3D Representation Learning Guided by Generative Pretraining
Created
2023-02-07
11 commits to main branch, last one 3 months ago
A detection/segmentation dataset with labels characterized by intricate and flexible expressions. "Described Object Detection: Liberating Object Detection with Flexible Expressions" (NeurIPS 2023).
Created
2023-07-19
13 commits to main branch, last one 3 months ago
Official pytorch repository for CG-DETR "Correlation-guided Query-Dependency Calibration in Video Representation Learning for Temporal Grounding"
Created
2023-11-10
16 commits to main branch, last one 2 months ago
[NeurIPS2023] Parameter-efficient Tuning of Large-scale Multimodal Foundation Model
Created
2023-05-14
20 commits to main branch, last one 7 months ago
A python tool to perform deep learning experiments on multimodal remote sensing data.
Created
2021-02-28
26 commits to main branch, last one 2 years ago
An official implementation of Advancing Radiograph Representation Learning with Masked Record Modeling (ICLR'23)
Created
2023-01-18
14 commits to main branch, last one about a year ago
Japanese CLIP by rinna Co., Ltd.
Created
2022-04-25
31 commits to master branch, last one about a year ago
[ICCV 2021] Official implementation of the paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering"
Created
2021-07-23
61 commits to main branch, last one 2 years ago
[NeurIPS 2023] A faithful benchmark for vision-language compositionality
Created
2023-06-04
54 commits to main branch, last one 4 months ago
【CVPR2024】Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification
Created
2024-03-14
121 commits to master branch, last one 13 days ago
A curated list of vision-and-language pre-training (VLP). :-)
Created
2021-10-30
19 commits to main branch, last one about a year ago
Code for the IEEE Signal Processing Letters 2022 paper "UAVM: Towards Unifying Audio and Visual Models".
Created
2023-02-16
14 commits to main branch, last one about a year ago
Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline (CVPR 2023)
Created
2023-04-05
13 commits to main branch, last one 4 months ago
Enhancing Large Vision Language Models with Self-Training on Image Comprehension.
Created
2024-05-27
17 commits to main branch, last one 27 days ago
MMEA: Entity Alignment for Multi-Modal Knowledge Graphs, KSEM 2020
Created
2021-03-22
8 commits to master branch, last one 2 years ago