36 results found Sort:
- Filter by Primary Language:
- Python (31)
- Jupyter Notebook (1)
- +
An open source implementation of CLIP.
Created
2021-07-28
572 commits to main branch, last one 16 days ago
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
Created
2022-07-08
382 commits to master branch, last one 4 months ago
Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration
Created
2023-05-23
142 commits to main branch, last one 8 months ago
The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".
Created
2023-03-02
36 commits to main branch, last one 11 months ago
A concise but complete implementation of CLIP with various experimental improvements from recent papers
Created
2021-12-01
76 commits to main branch, last one about a year ago
A curated list of Visual Question Answering(VQA)(Image/Video Question Answering),Visual Question Generation ,Visual Dialog ,Visual Commonsense Reasoning and related area.
Created
2019-03-03
36 commits to master branch, last one about a year ago
[CVPR 2024 & NeurIPS 2024] EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI
Created
2023-12-11
105 commits to main branch, last one 3 months ago
Build high-performance AI models with modular building blocks
Created
2023-07-09
886 commits to master branch, last one 5 days ago
CVPR 2023-2024 Papers: Dive into advanced research presented at the leading computer vision conference. Keep up to date with the latest developments in computer vision and deep learning. Code included...
cvpr
cvpr2023
cvpr2024
datasets
biometrics
segmentation
deep-learning
scene-analysis
shape-analysis
computer-vision
image-synthesis
video-synthesis
face-recognition
action-recognition
autonomous-driving
gesture-recognition
pattern-recognition
multi-modal-learning
medical-image-processing
self-supervised-learning
Created
2023-08-09
1,146 commits to main branch, last one 5 months ago
Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey
awsome
survey
surveys
paper-list
awsome-list
entity-linking
knowledge-graph
entity-alignment
image-generation
multi-modal-fusion
image-classification
multi-modal-learning
cross-modal-retrieval
large-language-models
information-extraction
visual-question-answering
knowledge-graph-embeddings
multi-modal-knowledge-graph
Created
2024-01-29
83 commits to main branch, last one 11 days ago
[CVPR 2024] Official PyTorch Code for "PromptKD: Unsupervised Prompt Distillation for Vision-Language Models"
Created
2024-02-27
41 commits to main branch, last one 24 days ago
[ICCV 2023] Implicit Neural Representation for Cooperative Low-light Image Enhancement
Created
2023-03-20
38 commits to main branch, last one 9 months ago
Minimal sharded dataset loaders, decoders, and utils for multi-modal document, image, and text datasets.
Created
2023-06-06
36 commits to main branch, last one 8 months ago
The official repository of Achelous and Achelous++
Created
2023-03-17
150 commits to main branch, last one 5 months ago
[ICML 2023] Contrast with Reconstruct: Contrastive 3D Representation Learning Guided by Generative Pretraining
Created
2023-02-07
13 commits to main branch, last one 5 months ago
Official pytorch repository for CG-DETR "Correlation-guided Query-Dependency Calibration in Video Representation Learning for Temporal Grounding"
Created
2023-11-10
18 commits to main branch, last one 4 months ago
A detection/segmentation dataset with labels characterized by intricate and flexible expressions. "Described Object Detection: Liberating Object Detection with Flexible Expressions" (NeurIPS 2023).
Created
2023-07-19
13 commits to main branch, last one 9 months ago
【CVPR2024】Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification
Created
2024-03-14
124 commits to master branch, last one about a month ago
A python tool to perform deep learning experiments on multimodal remote sensing data.
Created
2021-02-28
26 commits to main branch, last one 2 years ago
[NeurIPS2023] Parameter-efficient Tuning of Large-scale Multimodal Foundation Model
Created
2023-05-14
20 commits to main branch, last one about a year ago
An official implementation of Advancing Radiograph Representation Learning with Masked Record Modeling (ICLR'23)
Created
2023-01-18
14 commits to main branch, last one about a year ago
[NeurIPS 2023] A faithful benchmark for vision-language compositionality
Created
2023-06-04
54 commits to main branch, last one 10 months ago
Japanese CLIP by rinna Co., Ltd.
Created
2022-04-25
31 commits to master branch, last one 2 years ago
[ICCV 2021] Official implementation of the paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering"
Created
2021-07-23
61 commits to main branch, last one 3 years ago
Enhancing Large Vision Language Models with Self-Training on Image Comprehension.
Created
2024-05-27
17 commits to main branch, last one 6 months ago
Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline (CVPR 2023)
Created
2023-04-05
13 commits to main branch, last one 10 months ago
A curated list of vision-and-language pre-training (VLP). :-)
Created
2021-10-30
19 commits to main branch, last one 2 years ago
This repository contains code to download data for the preprint "MMEarth: Exploring Multi-Modal Pretext Tasks For Geospatial Representation Learning"
Created
2024-03-25
46 commits to main branch, last one about a month ago
Code for the IEEE Signal Processing Letters 2022 paper "UAVM: Towards Unifying Audio and Visual Models".
Created
2023-02-16
14 commits to main branch, last one about a year ago
[ICLR 2024 Spotlight] This is the official code for the paper "SNIP: Bridging Mathematical Symbolic and Numeric Realms with Unified Pre-training"
Created
2024-01-07
33 commits to main branch, last one 2 months ago