90 results found Sort:
- Filter by Primary Language:
- Python (67)
- Jupyter Notebook (7)
- HTML (2)
- OpenEdge ABL (1)
- TeX (1)
- +
Reading list for research topics in multimodal machine learning
Created
2019-05-27
435 commits to master branch, last one 5 months ago
An open-source framework for training large multimodal models.
Created
2022-10-20
502 commits to main branch, last one about a year ago
Prompt Learning for Vision-Language Models (IJCV'22, CVPR'22)
Created
2021-09-01
66 commits to main branch, last one 2 years ago
A curated list of Multimodal Related Research.
Created
2019-07-31
206 commits to master branch, last one about a year ago
[CVPR'24] UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition
Created
2023-11-24
62 commits to main branch, last one 13 days ago
ICCV 2023 Papers: Discover cutting-edge research from ICCV 2023, the leading computer vision conference. Stay updated on the latest in computer vision and deep learning, with code included. ⭐ support ...
Created
2023-09-04
1,068 commits to main branch, last one 2 months ago
A Comparative Framework for Multimodal Recommender Systems
Created
2018-07-17
1,368 commits to master branch, last one about a month ago
An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"
Created
2021-04-13
29 commits to master branch, last one 2 years ago
Papers, code and datasets about deep learning and multi-modal learning for video analysis
Created
2017-06-14
91 commits to master branch, last one 3 years ago
This repository contains various models targetting multimodal representation learning, multimodal fusion for downstream tasks such as multimodal sentiment analysis.
Created
2021-08-28
95 commits to main branch, last one 2 years ago
[CVPR2023 Highlight] GRES: Generalized Referring Expression Segmentation
Created
2023-03-11
14 commits to main branch, last one about a year ago
Multimodal model for text and tabular data with HuggingFace transformers as building block for text data
Created
2020-08-20
169 commits to master branch, last one 7 days ago
Multi-Modal learning toolkit based on PaddlePaddle and PyTorch, supporting multiple applications such as multi-modal classification, cross-modal retrieval and image caption.
Created
2022-01-05
198 commits to Pytorch branch, last one about a year ago
A collection of resources on applications of multi-modal learning in medical imaging.
Created
2022-07-13
149 commits to main branch, last one 5 days ago
A curated list of awesome vision and language resources (still under construction... stay tuned!)
Created
2019-10-25
48 commits to master branch, last one 3 days ago
[NeurIPS 2021] Multiscale Benchmarks for Multimodal Representation Learning
Created
2021-03-05
1,258 commits to main branch, last one 9 months ago
[ICCV 2023] MeViS: A Large-scale Benchmark for Video Segmentation with Motion Expressions
Created
2023-08-01
11 commits to main branch, last one 12 months ago
Multi-modality pre-training
Created
2022-03-15
104 commits to main branch, last one 6 months ago
Knowledge-Aware machine LEarning (KALE): accessible machine learning from multiple sources for interdisciplinary research, part of the 🔥PyTorch ecosystem. ⭐ Star to support our work!
Created
2020-06-30
3,054 commits to main branch, last one about a month ago
ICASSP 2023-2024 Papers: A complete collection of influential and exciting research papers from the ICASSP 2023-24 conferences. Explore the latest advancements in acoustics, speech and signal processi...
asr
vad
icassp
denoising
icassp2023
icassp2024
face-recognition
image-generation
keyword-spotting
music-generation
domain-adaptation
generative-models
language-modeling
signal-processing
signal-restoration
speech-recognition
multimodal-learning
semantic-segmentation
self-supervised-learning
spoken-language-understanding
Created
2023-08-01
903 commits to main branch, last one a day ago
An open source implementation of "Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning", an all-new multi modal AI that uses just a decoder to generate both text and images
Created
2023-07-14
73 commits to main branch, last one 10 months ago
This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
Created
2023-11-23
110 commits to main branch, last one 23 days ago
Research Trends in LLM-guided Multimodal Learning.
Created
2023-05-29
16 commits to main branch, last one about a year ago
[CVPR'24 Highlight] GPT4Point: A Unified Framework for Point-Language Understanding and Generation.
Created
2023-08-12
76 commits to main branch, last one 6 months ago
[ECCV'22] Official repository of paper titled "Class-agnostic Object Detection with Multi-modal Transformer".
Created
2021-11-16
33 commits to main branch, last one about a year ago
[IEEE Transactions on Medical Imaging/TMI] This repo is the official implementation of "LViT: Language meets Vision Transformer in Medical Image Segmentation"
Created
2022-03-10
64 commits to main branch, last one about a year ago
[NeurIPS 2023] This repository includes the official implementation of our paper "An Inverse Scaling Law for CLIP Training"
Created
2023-04-26
79 commits to master branch, last one 5 months ago
List of academic resources on Multimodal ML for Music
Created
2022-12-29
11 commits to main branch, last one about a year ago
[T-PAMI] A curated list of self-supervised multimodal learning resources.
Created
2023-03-31
14 commits to main branch, last one 2 months ago
[CVPR 2022] Show Me What and Tell Me How: Video Synthesis via Multimodal Conditioning
Created
2022-03-04
6 commits to main branch, last one 2 years ago