4 results found Sort:

17
267
mit
17
[NIPS2023] Code and Model for VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
Created 2023-05-29
13 commits to master branch, last one 11 months ago
13
205
unknown
12
We introduce temporal working memory (TWM), which aims to enhance the temporal modeling capabilities of Multimodal foundation models (MFMs). This plug-and-play module can be easily integrated into exi...
Created 2025-01-23
27 commits to main branch, last one 29 days ago
MADELEINE: multi-stain slide representation learning (ECCV'24)
Created 2024-07-16
45 commits to main branch, last one 5 days ago
Official implementation for "MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?"
Created 2024-06-11
29 commits to main branch, last one 3 months ago