97 results found Sort:

Reading list for research topics in multimodal machine learning
Created 2019-05-27
435 commits to master branch, last one 6 months ago
An open-source framework for training large multimodal models.
Created 2022-10-20
502 commits to main branch, last one about a year ago
204
1.8k
mit
16
Prompt Learning for Vision-Language Models (IJCV'22, CVPR'22)
Created 2021-09-01
66 commits to main branch, last one 2 years ago
A curated list of Multimodal Related Research.
Created 2019-07-31
206 commits to master branch, last one about a year ago
56
956
apache-2.0
13
[CVPR'24] UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition
Created 2023-11-24
62 commits to main branch, last one about a month ago
ICCV 2023 Papers: Discover cutting-edge research from ICCV 2023, the leading computer vision conference. Stay updated on the latest in computer vision and deep learning, with code included. ⭐ support ...
Created 2023-09-04
1,068 commits to main branch, last one 3 months ago
148
897
apache-2.0
26
A Comparative Framework for Multimodal Recommender Systems
Created 2018-07-17
1,370 commits to master branch, last one 3 days ago
125
897
mit
13
An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"
Created 2021-04-13
29 commits to master branch, last one 2 years ago
This repository contains various models targetting multimodal representation learning, multimodal fusion for downstream tasks such as multimodal sentiment analysis.
Created 2021-08-28
95 commits to main branch, last one 2 years ago
Papers, code and datasets about deep learning and multi-modal learning for video analysis
Created 2017-06-14
91 commits to master branch, last one 3 years ago
[CVPR2023 Highlight] GRES: Generalized Referring Expression Segmentation
Created 2023-03-11
14 commits to main branch, last one about a year ago
A collection of resources on applications of multi-modal learning in medical imaging.
Created 2022-07-13
151 commits to main branch, last one about a month ago
Multimodal model for text and tabular data with HuggingFace transformers as building block for text data
Created 2020-08-20
169 commits to master branch, last one about a month ago
98
565
apache-2.0
42
Multi-Modal learning toolkit based on PaddlePaddle and PyTorch, supporting multiple applications such as multi-modal classification, cross-modal retrieval and image caption.
Created 2022-01-05
198 commits to Pytorch branch, last one about a year ago
A curated list of awesome vision and language resources (still under construction... stay tuned!)
Created 2019-10-25
48 commits to master branch, last one about a month ago
[NeurIPS 2021] Multiscale Benchmarks for Multimodal Representation Learning
Created 2021-03-05
1,258 commits to main branch, last one 10 months ago
[ICCV 2023] MeViS: A Large-scale Benchmark for Video Segmentation with Motion Expressions
Created 2023-08-01
11 commits to main branch, last one about a year ago
36
479
other
14
Multi-modality pre-training
Created 2022-03-15
104 commits to main branch, last one 7 months ago
64
445
mit
12
Knowledge-Aware machine LEarning (KALE): accessible machine learning from multiple sources for interdisciplinary research, part of the 🔥PyTorch ecosystem. ⭐ Star to support our work!
Created 2020-06-30
3,054 commits to main branch, last one 2 months ago
ICASSP 2023-2024 Papers: A complete collection of influential and exciting research papers from the ICASSP 2023-24 conferences. Explore the latest advancements in acoustics, speech and signal processi...
Created 2023-08-01
975 commits to main branch, last one 16 hours ago
29
370
apache-2.0
4
This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
Created 2023-11-23
131 commits to main branch, last one 11 days ago
18
364
mit
22
An open source implementation of "Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning", an all-new multi modal AI that uses just a decoder to generate both text and images
Created 2023-07-14
73 commits to main branch, last one about a year ago
Research Trends in LLM-guided Multimodal Learning.
Created 2023-05-29
16 commits to main branch, last one about a year ago
[CVPR'24 Highlight] GPT4Point: A Unified Framework for Point-Language Understanding and Generation.
Created 2023-08-12
76 commits to main branch, last one 7 months ago
26
313
mit
3
[IEEE Transactions on Medical Imaging/TMI] This repo is the official implementation of "LViT: Language meets Vision Transformer in Medical Image Segmentation"
Created 2022-03-10
64 commits to main branch, last one about a year ago
[ECCV'22] Official repository of paper titled "Class-agnostic Object Detection with Multi-modal Transformer".
Created 2021-11-16
33 commits to main branch, last one about a year ago
13
303
apache-2.0
14
[NeurIPS 2023] This repository includes the official implementation of our paper "An Inverse Scaling Law for CLIP Training"
Created 2023-04-26
79 commits to master branch, last one 6 months ago
List of academic resources on Multimodal ML for Music
Created 2022-12-29
11 commits to main branch, last one about a year ago
[T-PAMI] A curated list of self-supervised multimodal learning resources.
Created 2023-03-31
14 commits to main branch, last one 4 months ago
10
197
mit
7
My implementation of "Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution"
Created 2023-09-28
18 commits to main branch, last one 10 months ago