Trending repositories for topic text-to-speech
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
Translate the video from one language to another and add dubbing. 将视频从一种语言翻译为另一种语言,同时支持语音识别转录、语音合成、字幕翻译。
Speech-to-text, text-to-speech, speaker diarization, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, HarmonyOS, Raspberry Pi, RISC-V,...
High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.
Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key
A simple, high-quality voice conversion tool focused on ease of use and performance.
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, ...
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
eSpeak NG is an open source speech synthesizer that supports more than hundred languages and accents.
Comprehensive Gradio WebUI for audio processing, powered by Whisper engines (Whisper, Faster-Whisper, Whisper-Timestamped). Features Voice Changer, zero-shot Voice Cloning (E2, F5-TTS), YouTube downlo...
🍦 Speech-AI-Forge is a project developed around TTS generation model, implementing an API Server and a Gradio-based WebUI.
This repository contains a hand-curated resources for Prompt Engineering with a focus on Generative Pre-trained Transformer (GPT), ChatGPT, PaLM etc
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
Free, high quality text-to-speech for your Obsidian notes, leveraging Microsoft Edge's Read Aloud API.
AI Plugin is a powerful extension for the Payload CMS, integrating advanced AI capabilities to enhance content creation and management.
Text-to-speech API endpoint compatible with OpenAI's TTS API endpoint, using Microsoft Edge TTS to generate speech for free locally
Automated voice dubbing for YouTube videos using Docker, OpenVoice, and FastAPI. Translates and dubs videos with original voice timbre.
Cross-platform speech toolset, used from the command-line or as a Node.js library. Includes a variety of engines for speech synthesis, speech recognition, forced alignment, speech translation, voice i...
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
🍦 Speech-AI-Forge is a project developed around TTS generation model, implementing an API Server and a Gradio-based WebUI.
A simple, high-quality voice conversion tool focused on ease of use and performance.
🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
[ICASSP 2024] 🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching
Speech-to-text, text-to-speech, speaker diarization, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, HarmonyOS, Raspberry Pi, RISC-V,...
Comprehensive Gradio WebUI for audio processing, powered by Whisper engines (Whisper, Faster-Whisper, Whisper-Timestamped). Features Voice Changer, zero-shot Voice Cloning (E2, F5-TTS), YouTube downlo...
Talking Head (3D): A JavaScript class for real-time lip-sync using Ready Player Me full-body 3D avatars.
The official Python API for ElevenLabs Text to Speech.
High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.
Controllable and fast Text-to-Speech for over 7000 languages!
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
Translate the video from one language to another and add dubbing. 将视频从一种语言翻译为另一种语言,同时支持语音识别转录、语音合成、字幕翻译。
Speech-to-text, text-to-speech, speaker diarization, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, HarmonyOS, Raspberry Pi, RISC-V,...
Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key
Comprehensive Gradio WebUI for audio processing, powered by Whisper engines (Whisper, Faster-Whisper, Whisper-Timestamped). Features Voice Changer, zero-shot Voice Cloning (E2, F5-TTS), YouTube downlo...
High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.
A simple, high-quality voice conversion tool focused on ease of use and performance.
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, ...
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
Cross-platform speech toolset, used from the command-line or as a Node.js library. Includes a variety of engines for speech synthesis, speech recognition, forced alignment, speech translation, voice i...
🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time
Free, high quality text-to-speech for your Obsidian notes, leveraging Microsoft Edge's Read Aloud API.
Cross-platform speech toolset, used from the command-line or as a Node.js library. Includes a variety of engines for speech synthesis, speech recognition, forced alignment, speech translation, voice i...
AI Plugin is a powerful extension for the Payload CMS, integrating advanced AI capabilities to enhance content creation and management.
AutoHighlightTTS is a simple, powerful solution for Android Text to Speech, featuring auto sentence highlighting with custom styles, auto-scrolling text, and adjustable language, pitch, and speech rat...
AAAI 2025: Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
Text-to-speech API endpoint compatible with OpenAI's TTS API endpoint, using Microsoft Edge TTS to generate speech for free locally
A lightweight pure C++ Text-to-Speech (TTS) pipeline with OpenVINO, supporting multiple languages.
Revamp your morning routine and supercharge productivity with Dispatch. The ultimate Apple Shortcut powered by ChatGPT and ElevenLabs.
SSR-Speech: Towards Stable, Safe and Robust Zero-shot Speech Editing and Synthesis
Make Azure natural TTS voices accessible to any SAPI 5-compatible application.
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Automated voice dubbing for YouTube videos using Docker, OpenVoice, and FastAPI. Translates and dubs videos with original voice timbre.
Offline Speech Recognition with OpenAI Whisper and TensorFlow Lite for Android
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
[ICASSP 2024] 🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching
Comprehensive Gradio WebUI for audio processing, powered by Whisper engines (Whisper, Faster-Whisper, Whisper-Timestamped). Features Voice Changer, zero-shot Voice Cloning (E2, F5-TTS), YouTube downlo...
🍦 Speech-AI-Forge is a project developed around TTS generation model, implementing an API Server and a Gradio-based WebUI.
🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
AutoHighlightTTS is a simple, powerful solution for Android Text to Speech, featuring auto sentence highlighting with custom styles, auto-scrolling text, and adjustable language, pitch, and speech rat...
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
Translate the video from one language to another and add dubbing. 将视频从一种语言翻译为另一种语言,同时支持语音识别转录、语音合成、字幕翻译。
Comprehensive Gradio WebUI for audio processing, powered by Whisper engines (Whisper, Faster-Whisper, Whisper-Timestamped). Features Voice Changer, zero-shot Voice Cloning (E2, F5-TTS), YouTube downlo...
Speech-to-text, text-to-speech, speaker diarization, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, HarmonyOS, Raspberry Pi, RISC-V,...
Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key
High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, ...
🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time
eSpeak NG is an open source speech synthesizer that supports more than hundred languages and accents.
A simple, high-quality voice conversion tool focused on ease of use and performance.
SOTA discrete acoustic codec models with 40 tokens per second for audio language modeling
This repository contains a hand-curated resources for Prompt Engineering with a focus on Generative Pre-trained Transformer (GPT), ChatGPT, PaLM etc
Free, high quality text-to-speech for your Obsidian notes, leveraging Microsoft Edge's Read Aloud API.
AutoHighlightTTS is a simple, powerful solution for Android Text to Speech, featuring auto sentence highlighting with custom styles, auto-scrolling text, and adjustable language, pitch, and speech rat...
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Cross-platform speech toolset, used from the command-line or as a Node.js library. Includes a variety of engines for speech synthesis, speech recognition, forced alignment, speech translation, voice i...
Text-to-speech API endpoint compatible with OpenAI's TTS API endpoint, using Microsoft Edge TTS to generate speech for free locally
A lightweight pure C++ Text-to-Speech (TTS) pipeline with OpenVINO, supporting multiple languages.
Generate an engaging podcast based on your document using Azure OpenAI and Azure Speech.
AI Plugin is a powerful extension for the Payload CMS, integrating advanced AI capabilities to enhance content creation and management.
Make Azure natural TTS voices accessible to any SAPI 5-compatible application.
Comprehensive Gradio WebUI for audio processing, powered by Whisper engines (Whisper, Faster-Whisper, Whisper-Timestamped). Features Voice Changer, zero-shot Voice Cloning (E2, F5-TTS), YouTube downlo...
AAAI 2025: Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
AivisSpeech Engine: AI Voice Imitation System - Text to Speech Engine
SOTA discrete acoustic codec models with 40 tokens per second for audio language modeling
Wip From NTTS Piper Ultra Fast And Efficient Text To Speech Library For Cross Platform Work on any edge device with cpu only
AivisSpeech: AI Voice Imitation System - Text to Speech Software
Using a single image and just 10 seconds of sample audio, our project enables you to create a video where it appears as if you're speaking the desired text.
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.
Comprehensive Gradio WebUI for audio processing, powered by Whisper engines (Whisper, Faster-Whisper, Whisper-Timestamped). Features Voice Changer, zero-shot Voice Cloning (E2, F5-TTS), YouTube downlo...
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
🍦 Speech-AI-Forge is a project developed around TTS generation model, implementing an API Server and a Gradio-based WebUI.
SOTA discrete acoustic codec models with 40 tokens per second for audio language modeling
Local voice chatbot for engaging conversations, powered by Ollama, Hugging Face Transformers, and Coqui TTS Toolkit
Implementation of E2-TTS, "Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS", in Pytorch
Turn PDFs and EPUBs into audiobooks, subtitles or videos into dubbed videos (including translation), and more. For free. Pandrator uses local models, notably XTTS, including voice-cloning (instant, RV...
AivisSpeech: AI Voice Imitation System - Text to Speech Software
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Translate the video from one language to another and add dubbing. 将视频从一种语言翻译为另一种语言,同时支持语音识别转录、语音合成、字幕翻译。
High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, ...
Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key
Foundational model for human-like, expressive TTS
Speech-to-text, text-to-speech, speaker diarization, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, HarmonyOS, Raspberry Pi, RISC-V,...
🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time
Comprehensive Gradio WebUI for audio processing, powered by Whisper engines (Whisper, Faster-Whisper, Whisper-Timestamped). Features Voice Changer, zero-shot Voice Cloning (E2, F5-TTS), YouTube downlo...
🚀 一键部署(含离线整合包)!基于 ChatTTS ,支持流式输出、音色抽卡、长音频生成和分角色朗读。简单易用,无需复杂安装。
EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
eSpeak NG is an open source speech synthesizer that supports more than hundred languages and accents.
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
🚀 一键部署(含离线整合包)!基于 ChatTTS ,支持流式输出、音色抽卡、长音频生成和分角色朗读。简单易用,无需复杂安装。
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
🍦 Speech-AI-Forge is a project developed around TTS generation model, implementing an API Server and a Gradio-based WebUI.
🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
SOTA discrete acoustic codec models with 40 tokens per second for audio language modeling
Foundational model for human-like, expressive TTS
Implementation of E2-TTS, "Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS", in Pytorch
Talking Head (3D): A JavaScript class for real-time lip-sync using Ready Player Me full-body 3D avatars.
Praises is a text-to-speech tool that can help you read text easily.
🐍 🤖 Pip installable package for StyleTTS 2 human-level text-to-speech and voice cloning
If you've ever had the wish to talk to your AI Waifu using quality characters and voices for character voicing, then I suggest Soul of Waifu. Don't miss the opportunity to touch your dream!
Implementation of OpenAI's Text-To-Speech in Unity. Synthesize any text and play it via any AudioSource.
Chat with GPT LLMs over voice, UI & terminal, all with access to the internet. Powered by OpenAI.