Search Results - RepositoryStats

702

9.0k

mit

81

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, ...

vits audit emilia vall-e maskgct vocoder audioldm fastspeech2 text-to-audio naturalspeech2 text-to-speech audio-synthesis audio-generation music-generation speech-synthesis voice-conversion singing-voice-conversion

Created 2023-11-15

123 commits to main branch, last one 12 days ago

MMAudio hkchengrex

169

1.4k

mit

16

[CVPR 2025] MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis

audio deep-learning text-to-audio video-to-audio audio-synthesis computer-vision

Created 2024-12-07

116 commits to main branch, last one 19 hours ago

tango declare-lab

98

1.2k

other

27

A family of diffusion models for text-to-audio generation.

diffusion text-to-audio language-models audio-generation diffusion-models large-language-models

Created 2023-04-10

133 commits to master branch, last one 3 months ago

audio-webui gitmylo

105

1.2k

mit

22

A webui for different audio related Neural Networks

ai aio rvc tts bark music rvc-gui audioldm bark-gui all-in-one audiocraft text-to-audio voice-cloning text-to-speech generative-audio generative-music artificial-intelligence

Created 2023-05-05

427 commits to master branch, last one 8 months ago

StreamSpeech ictnlp

82

1.1k

mit

13

StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.

Created 2024-06-04

25 commits to main branch, last one 8 months ago

TangoFlux declare-lab

65

710

other

8

TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching

tta flow-matching generative-ai text-to-audio text-to-audio-ai

Created 2024-12-28

87 commits to main branch, last one about a month ago

Make-An-Audio Text-to-Audio

88

645

mit

52

PyTorch Implementation of Make-An-Audio (ICML'23) with a Text-to-Audio Generative Model

latent-space text-to-audio video-to-audio diffusion-models latent-diffusion

Created 2023-06-17

19 commits to main branch, last one 11 months ago

OpenMusic ivcylc

56

554

mit

12

OpenMusic: SOTA Text-to-music (TTM) Generation

ai dit mdt vall-e ai-music audioldm hifi-gan music-ai text-to-audio text-to-music diffusion-models music-generation text-to-audio-ai ai-music-generator ai-music-generation diffusion-transformer music-ai-architectures text-to-music-transformer

Created 2024-05-24

119 commits to main branch, last one 2 months ago

nuwa-pytorch lucidrains

56

547

mit

22

Implementation of NÜWA, state of the art attention network for text to video synthesis, in Pytorch

transformers deep-learning text-to-audio text-to-video attention-mechanism artificial-intelligence

Created 2021-11-28

195 commits to main branch, last one 2 years ago

Awesome-LLMs-meet-Multimodal-Generation YingqingHe

26

463

unknown

18

🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).

llm aigc lvlm mllm text-to-3d multimodality text-to-audio text-to-image text-to-music text-to-sound text-to-video text-to-speech multimodal-models large-language-models multimodal-generation large-vision-language-models multimodal-large-language-models

Created 2023-11-17

357 commits to main branch, last one 21 days ago

mustango AMAAI-Lab

28

360

mit

16

Mustango: Toward Controllable Text-to-Music Generation

text-to-audio text-to-music diffusion-models large-language-models

Created 2023-11-14

100 commits to main branch, last one about a month ago

EzAudio haidog-yaqub

11

265

mit

18

High-quality Text-to-Audio Generation with Efficient Diffusion Transformer

generative-ai text-to-audio diffusion-models

Created 2024-09-11

42 commits to main branch, last one about a month ago

Auffusion happylittlecat2333

14

182

other

9

Official codes and models of the paper "Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation"

diffusion text-to-audio audio-generation diffusion-models large-language-models

Created 2023-11-15

18 commits to main branch, last one about a year ago

word2wave ilaria-manco

15

119

mit

3

Word2Wave: a framework for generating short audio samples from a text prompt using WaveGAN and COALA.

ai-music text-to-audio audio-generation music-generation

Created 2021-04-20

80 commits to main branch, last one 3 years ago

sub-to-audio bnsantoso

13

114

mpl-2.0

5

Subtitle to audio, generate audio from any subtitle file using Coqui-ai TTS and synchronize the audio timing according to subtitle time.

tts python text-to-audio text-to-speech audio-processing subtitle-to-audio subtitle-to-voice subtitle-to-speech subtitle-conversion

Created 2023-07-17

51 commits to main branch, last one about a year ago

soundctm sony

8

91

mit

4

Pytorch implementation of SoundCTM

pytorch text-to-audio audio-generation diffusion-models

Created 2024-06-04

29 commits to main branch, last one 25 days ago

WaveGrad2 keonlee9420

18

69

mit

6

PyTorch Implementation of Google Brain's WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis

tts audio robust pytorch duration synthesis end-to-end neural-tts text-to-audio score-matching text-to-speech speech-synthesis non-autoregressive phoneme-to-waveform

Created 2021-06-18

6 commits to main branch, last one 3 years ago

soundstorm RhythrosaLabs

8

32

mit

3

Soundstorm is a cutting-edge AI-powered audio manipulation application designed to provide a rich yet simplified experience for sound designers, algorithmic composers, and experimental audio enthusias...

gpt midi gpt-4 sound sounds chatbot chatgpt ai-audio chat-gpt audio-tools random-music sound-design audio-toolbox text-to-audio audio-processing sound-processing algorithmic-music ai-audio-generation algorithmic-composition

Created 2023-09-14

41 commits to main branch, last one 11 months ago

ctag PapayaResearch

2

25

mit

2

Creative Text-to-Audio Generation via Synthesizer Programming @ ICML'24

jax synthesizer generative-ai text-to-audio machine-learning

Created 2023-08-17

4 commits to main branch, last one 7 months ago