Search Results - RepositoryStats

NAACL_2025_TWM xid32

30

308

unknown

21

We introduce temporal working memory (TWM), which aims to enhance the temporal modeling capabilities of Multimodal foundation models (MFMs). This plug-and-play module can be easily integrated into exi...

working-memory video-captioning question-answering video-text-retrieval audio-visual-learning multimodal-foundation-model multimodal-large-language-models

Created 2025-01-23

27 commits to main branch, last one 2 months ago

VAST TXH-mercury

17

273

mit

16

[NIPS2023] Code and Model for VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset

dataset audio-language vision-language cross-modality-pretraining vision-audio-subtitle-text multimodal-foundation-model

Created 2023-05-29

13 commits to master branch, last one about a year ago

MADELEINE mahmoodlab

5

49

other

3

MADELEINE: multi-stain slide representation learning (ECCV'24)

ssl cancer pathology molecular-status-prediction multimodal-foundation-model slide-representation-learning

Created 2024-07-16

45 commits to main branch, last one about a month ago

MJ-Bench MJ-Bench

5

43

mit

1

Official implementation for "MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?"

reward-models llm-as-a-judge llm-benchmarking multimodal-judge multimodal-foundation-model

Created 2024-06-11

32 commits to main branch, last one about a month ago