Trending repositories for topic speech-synthesis
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key
Comprehensive Gradio WebUI for audio processing, powered by Whisper engines (Whisper, Faster-Whisper, Whisper-Timestamped). Features Voice Changer, zero-shot Voice Cloning (E2, F5-TTS), YouTube downlo...
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation a...
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, ...
eSpeak NG is an open source speech synthesizer that supports more than hundred languages and accents.
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.
An Open Source text-to-speech system built by inverting Whisper.
✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM
so-vits-svc fork with realtime support, improved interface and more features.
Speech To Speech: an effort for an open-sourced and modular GPT4-o
✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM
Cross-platform speech toolset, used from the command-line or as a Node.js library. Includes a variety of engines for speech synthesis, speech recognition, forced alignment, speech translation, voice i...
[InterSpeech 24] FreeV: Free Lunch For Vocoders Through Pseudo Inversed Mel Filter
Comprehensive Gradio WebUI for audio processing, powered by Whisper engines (Whisper, Faster-Whisper, Whisper-Timestamped). Features Voice Changer, zero-shot Voice Cloning (E2, F5-TTS), YouTube downlo...
Neural Text To Speech Di Dart Cepat Ringan Tanpa Koneksi Internet dan bisa berjalan di cpu
Make Azure natural TTS voices accessible to any SAPI 5-compatible application.
Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key
Whisper Dart is a cross platform library for dart and flutter that allows converting audio to text / speech to text / inference from Open AI models
Speech Note Linux app. Note taking, reading and translating with offline Speech to Text, Text to Speech and Machine translation.
Custom nodes that extend the capabilities of Comfyui
A talking LLM that runs on your own computer without needing the internet.
eSpeak NG is an open source speech synthesizer that supports more than hundred languages and accents.
SummerTTS 是一个基于C++的独立编译的中文和英文语音合成项目,可以本地运行不需要网络,而且没有额外的依赖,一键编译完成即可用于中文和英文的语音合成。SummerTTS is a standalone Chinese and English speech synthesis(TTS) project that has almost no dependency and could be e...
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key
Comprehensive Gradio WebUI for audio processing, powered by Whisper engines (Whisper, Faster-Whisper, Whisper-Timestamped). Features Voice Changer, zero-shot Voice Cloning (E2, F5-TTS), YouTube downlo...
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation a...
State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.
eSpeak NG is an open source speech synthesizer that supports more than hundred languages and accents.
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
An Open Source text-to-speech system built by inverting Whisper.
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, ...
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
Cross-platform speech toolset, used from the command-line or as a Node.js library. Includes a variety of engines for speech synthesis, speech recognition, forced alignment, speech translation, voice i...
Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
Official PyTorch implementation of "Paralinguistics-Aware Speech-Empowered LLMs for Natural Conversation" (NeurIPS 2024)
Cross-platform speech toolset, used from the command-line or as a Node.js library. Includes a variety of engines for speech synthesis, speech recognition, forced alignment, speech translation, voice i...
✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM
[InterSpeech 24] FreeV: Free Lunch For Vocoders Through Pseudo Inversed Mel Filter
Comprehensive Gradio WebUI for audio processing, powered by Whisper engines (Whisper, Faster-Whisper, Whisper-Timestamped). Features Voice Changer, zero-shot Voice Cloning (E2, F5-TTS), YouTube downlo...
A talking LLM that runs on your own computer without needing the internet.
Make Azure natural TTS voices accessible to any SAPI 5-compatible application.
Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key
This repository is related to our Dataset and Detection code from the paper: AI-Synthesized Voice Detection Using Neural Vocoder Artifacts accepted in CVPR Workshop on Media Forensic 2023.
Speech Note Linux app. Note taking, reading and translating with offline Speech to Text, Text to Speech and Machine translation.
SummerTTS 是一个基于C++的独立编译的中文和英文语音合成项目,可以本地运行不需要网络,而且没有额外的依赖,一键编译完成即可用于中文和英文的语音合成。SummerTTS is a standalone Chinese and English speech synthesis(TTS) project that has almost no dependency and could be e...
Custom nodes that extend the capabilities of Comfyui
HiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise Filter and Inverse Short Time Fourier Transform
eSpeak NG is an open source speech synthesizer that supports more than hundred languages and accents.
FACodec: Speech Codec with Attribute Factorization used for NaturalSpeech 3
Comprehensive Gradio WebUI for audio processing, powered by Whisper engines (Whisper, Faster-Whisper, Whisper-Timestamped). Features Voice Changer, zero-shot Voice Cloning (E2, F5-TTS), YouTube downlo...
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, ...
✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation a...
State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.
eSpeak NG is an open source speech synthesizer that supports more than hundred languages and accents.
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
Speech To Speech: an effort for an open-sourced and modular GPT4-o
An Open Source text-to-speech system built by inverting Whisper.
High-quality and streaming Speech-to-Speech interactive agent in a single file. 只用一个文件实现的流式全双工语音交互原型智能体!
Custom nodes that extend the capabilities of Comfyui
Official PyTorch implementation of "Paralinguistics-Aware Speech-Empowered LLMs for Natural Conversation" (NeurIPS 2024)
Comprehensive Gradio WebUI for audio processing, powered by Whisper engines (Whisper, Faster-Whisper, Whisper-Timestamped). Features Voice Changer, zero-shot Voice Cloning (E2, F5-TTS), YouTube downlo...
✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM
High-quality and streaming Speech-to-Speech interactive agent in a single file. 只用一个文件实现的流式全双工语音交互原型智能体!
Cross-platform speech toolset, used from the command-line or as a Node.js library. Includes a variety of engines for speech synthesis, speech recognition, forced alignment, speech translation, voice i...
Make Azure natural TTS voices accessible to any SAPI 5-compatible application.
A talking LLM that runs on your own computer without needing the internet.
Implementation of BEST-RQ - a model for self-supervised learning of speech signals using a random projection quantizer, in Pytorch.
Custom nodes that extend the capabilities of Comfyui
This repository is related to our Dataset and Detection code from the paper: AI-Synthesized Voice Detection Using Neural Vocoder Artifacts accepted in CVPR Workshop on Media Forensic 2023.
FACodec: Speech Codec with Attribute Factorization used for NaturalSpeech 3
Speech Note Linux app. Note taking, reading and translating with offline Speech to Text, Text to Speech and Machine translation.
[InterSpeech 24] FreeV: Free Lunch For Vocoders Through Pseudo Inversed Mel Filter
Speech To Speech: an effort for an open-sourced and modular GPT4-o
Comprehensive Gradio WebUI for audio processing, powered by Whisper engines (Whisper, Faster-Whisper, Whisper-Timestamped). Features Voice Changer, zero-shot Voice Cloning (E2, F5-TTS), YouTube downlo...
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
A talking LLM that runs on your own computer without needing the internet.
High-quality and streaming Speech-to-Speech interactive agent in a single file. 只用一个文件实现的流式全双工语音交互原型智能体!
✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM
FACodec: Speech Codec with Attribute Factorization used for NaturalSpeech 3
DEX-TTS: Diffusion-based EXpressive TTS with Style Modeling on Time Variability
[InterSpeech 24] FreeV: Free Lunch For Vocoders Through Pseudo Inversed Mel Filter
Official PyTorch implementation of "Paralinguistics-Aware Speech-Empowered LLMs for Natural Conversation" (NeurIPS 2024)
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, ...
Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key
Foundational model for human-like, expressive TTS
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
An Open Source text-to-speech system built by inverting Whisper.
Speech To Speech: an effort for an open-sourced and modular GPT4-o
Comprehensive Gradio WebUI for audio processing, powered by Whisper engines (Whisper, Faster-Whisper, Whisper-Timestamped). Features Voice Changer, zero-shot Voice Cloning (E2, F5-TTS), YouTube downlo...
EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation a...
State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
eSpeak NG is an open source speech synthesizer that supports more than hundred languages and accents.
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
so-vits-svc fork with realtime support, improved interface and more features.
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
FACodec: Speech Codec with Attribute Factorization used for NaturalSpeech 3
Foundational model for human-like, expressive TTS
✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM
🐍 🤖 Pip installable package for StyleTTS 2 human-level text-to-speech and voice cloning
An Open Source text-to-speech system built by inverting Whisper.
[InterSpeech 24] FreeV: Free Lunch For Vocoders Through Pseudo Inversed Mel Filter
Neural Text To Speech Di Dart Cepat Ringan Tanpa Koneksi Internet dan bisa berjalan di cpu
AI-WEBUI: A universal web interface for AI creation, 一款好用的图像、音频、视频AI处理工具
Cross-platform speech toolset, used from the command-line or as a Node.js library. Includes a variety of engines for speech synthesis, speech recognition, forced alignment, speech translation, voice i...
PyTorch implementation of NEUTART, a system that creates photorealistic talking avatars from an input text transcription.
Custom nodes that extend the capabilities of Comfyui
Controllable and fast Text-to-Speech for over 7000 languages!
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, ...
Command-line interface and Python library to transcribe pinyin to IPA. The tones are attached to the vowel of the syllable.