Search Results - RepositoryStats

71

1.0k

apache-2.0

14

A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities

multimodal audio-language vision-language contrastive-loss foundation-models vision-transformer vision-and-language representation-learning

Created 2023-05-18

136 commits to main branch, last one 6 months ago

Awesome-Audio-LLM AudioLLMs

30

501

unknown

28

Audio Large Language Models

audio-language audio-processing audio-understanding

Created 2024-06-15

98 commits to main branch, last one about a month ago

VAST TXH-mercury

17

277

mit

16

[NIPS2023] Code and Model for VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset

dataset audio-language vision-language cross-modality-pretraining vision-audio-subtitle-text multimodal-foundation-model

Created 2023-05-29

13 commits to master branch, last one about a year ago

GAMA Sreyan88

11

123

apache-2.0

10

Code for the paper: GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities

audio dataset reasoning audio-language question-answering large-language-model multimodal-large-language-models

Created 2024-06-15

28 commits to main branch, last one 4 months ago