Statistics for topic text-to-speech
RepositoryStats tracks 663,340 Github repositories, of these 486 are tagged with the text-to-speech topic. The most common primary language for repositories using this topic is Python (267). Other languages include: Jupyter Notebook (33), TypeScript (32), JavaScript (30), C# (17), C++ (17), Java (11)
Stargazers over time for topic text-to-speech
Most starred repositories for topic text-to-speech (view more)
Trending repositories for topic text-to-speech (view more)
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train Qwen3, Llama 4, DeepSeek-R1, Gemma 3, TTS 2x faster with 70% less VRAM.
An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
Local, OpenAI-compatible text-to-speech (TTS) API using Chatterbox, enabling users to generate voice cloned speech anywhere the OpenAI API is used (e.g. Open WebUI, AnythingLLM, etc.)
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
Local, OpenAI-compatible text-to-speech (TTS) API using Chatterbox, enabling users to generate voice cloned speech anywhere the OpenAI API is used (e.g. Open WebUI, AnythingLLM, etc.)
Self-host the powerful Chatterbox TTS model. This server offers a user-friendly Web UI, flexible API endpoints (incl. OpenAI compatible), predefined voices, voice cloning, and large audiobook-scale te...
Neural Audio Codecs implemented in C# - DAC, SNAC, Encodec, Dia
A single-layer, streaming codec model providing SOTA audio quality and discrete tokens designed for superior downstream modelability.
🎙️ Automatically transcribe audio/video into high-quality, speaker-specific Text-To-Speech datasets ✨
Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train Qwen3, Llama 4, DeepSeek-R1, Gemma 3, TTS 2x faster with 70% less VRAM.
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Local, OpenAI-compatible text-to-speech (TTS) API using Chatterbox, enabling users to generate voice cloned speech anywhere the OpenAI API is used (e.g. Open WebUI, AnythingLLM, etc.)
🎙️ Automatically transcribe audio/video into high-quality, speaker-specific Text-To-Speech datasets ✨
Self-host the powerful Chatterbox TTS model. This server offers a user-friendly Web UI, flexible API endpoints (incl. OpenAI compatible), predefined voices, voice cloning, and large audiobook-scale te...
A single-layer, streaming codec model providing SOTA audio quality and discrete tokens designed for superior downstream modelability.
Neural Audio Codecs implemented in C# - DAC, SNAC, Encodec, Dia
Self-host the powerful Chatterbox TTS model. This server offers a user-friendly Web UI, flexible API endpoints (incl. OpenAI compatible), predefined voices, voice cloning, and large audiobook-scale te...
Unlimited text-to-speech in the Browser using Kokoro-JS, 100% local, 100% open source
Local, OpenAI-compatible text-to-speech (TTS) API using Chatterbox, enabling users to generate voice cloned speech anywhere the OpenAI API is used (e.g. Open WebUI, AnythingLLM, etc.)
A single-layer, streaming codec model providing SOTA audio quality and discrete tokens designed for superior downstream modelability.
Convert text to speech using Microsoft Azure Neural Text-to-Speech (TTS) and a simple Gradio web interface.
Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train Qwen3, Llama 4, DeepSeek-R1, Gemma 3, TTS 2x faster with 70% less VRAM.
A TTS model capable of generating ultra-realistic dialogue in one pass.
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Self-host the powerful Chatterbox TTS model. This server offers a user-friendly Web UI, flexible API endpoints (incl. OpenAI compatible), predefined voices, voice cloning, and large audiobook-scale te...
Unlimited text-to-speech in the Browser using Kokoro-JS, 100% local, 100% open source
A single-layer, streaming codec model providing SOTA audio quality and discrete tokens designed for superior downstream modelability.
Neural Audio Codecs implemented in C# - DAC, SNAC, Encodec, Dia
A Low-Frame-Rate, Semantically-Enhanced Neural Audio Codec for Speech Generation
A TTS model capable of generating ultra-realistic dialogue in one pass.
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Gradio WebUI for creators and developers, featuring key TTS (Edge-TTS, kokoro) and zero-shot Voice Cloning (E2 & F5-TTS, CosyVoice), with Whisper audio processing, YouTube download, Demucs vocal isola...
An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train Qwen3, Llama 4, DeepSeek-R1, Gemma 3, TTS 2x faster with 70% less VRAM.
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
A TTS model capable of generating ultra-realistic dialogue in one pass.
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Local voice chatbot for engaging conversations, powered by Ollama, Hugging Face Transformers, and Coqui TTS Toolkit
An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
[ICLR 2025] SOTA discrete acoustic codec models with 40/75 tokens per second for audio language modeling
Self-host the powerful Chatterbox TTS model. This server offers a user-friendly Web UI, flexible API endpoints (incl. OpenAI compatible), predefined voices, voice cloning, and large audiobook-scale te...