Trending repositories for topic text-to-speech
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
Translate the video from one language to another and add dubbing. 将视频从一种语言翻译为另一种语言,并添加配音
🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time
Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key
Using the TextSoundSaver application, you can convert text into realistic synthesized speech. The app achieves smooth and natural text-to-speech conversion. In addition to providing excellent voice sy...
High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.
Speech-to-text, text-to-speech, and speaker recongition using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 servers,...
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io
eSpeak NG is an open source speech synthesizer that supports more than hundred languages and accents.
:robot: :speech_balloon: Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)
VITS-based Voice Conversion focused on simplicity, quality and performance.
Using the TextSoundSaver application, you can convert text into realistic synthesized speech. The app achieves smooth and natural text-to-speech conversion. In addition to providing excellent voice sy...
🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
🗣️🎤 elevenlabs-api is an open source Java wrapper around the ElevenLabs Voice Synthesis and Cloning Web API.
Talking Head (3D): A JavaScript class for real-time lip-sync using Ready Player Me full-body 3D avatars.
Speech-to-text, text-to-speech, and speaker recongition using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 servers,...
Offline Speech Recognition with OpenAI Whisper and TensorFlow Lite for Android
From AI tools to TikTok video creation using FFMPEG, Microsoft Edge read aloud and OpenAI Whisper model
CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model
Translate the video from one language to another and add dubbing. 将视频从一种语言翻译为另一种语言,并添加配音
A web UI Project In order to learn the large language model. This project includes features such as chat, quantization, fine-tuning, prompt engineering templates, and multimodality.
🎤 Lobe TTS - A high-quality & reliable TTS/STT library for Server and Browser
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
Translate the video from one language to another and add dubbing. 将视频从一种语言翻译为另一种语言,并添加配音
🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time
Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key
High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.
EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
Speech-to-text, text-to-speech, and speaker recongition using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 servers,...
This repository contains a hand-curated resources for Prompt Engineering with a focus on Generative Pre-trained Transformer (GPT), ChatGPT, PaLM etc
eSpeak NG is an open source speech synthesizer that supports more than hundred languages and accents.
An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io
🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
Using the TextSoundSaver application, you can convert text into realistic synthesized speech. The app achieves smooth and natural text-to-speech conversion. In addition to providing excellent voice sy...
Wip From NTTS Piper Ultra Fast And Efficient Text To Speech Library For Cross Platform Work on any edge device with cpu only
🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
It is a multi-lingual (97 languages) text content automatic recognition and segmentation tool. 强大的TTS多语言(97种语言)混合文本内容自动分词工具。
Automatically Generate video based on given content!
Offline Speech Recognition with OpenAI Whisper and TensorFlow Lite for Android
Speech-to-text, text-to-speech, and speaker recongition using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 servers,...
Pandrator aspires to be a user-friendly app with a graphical interface and a one-click installer that creates high-quality speech from text in multiple languages (audiobooks, speech synchronised with ...
🗣️🎤 elevenlabs-api is an open source Java wrapper around the ElevenLabs Voice Synthesis and Cloning Web API.
Neural Text To Speech Di Dart Cepat Ringan Tanpa Koneksi Internet dan bisa berjalan di cpu
Talking Head (3D): A JavaScript class for real-time lip-sync using Ready Player Me full-body 3D avatars.
From AI tools to TikTok video creation using FFMPEG, Microsoft Edge read aloud and OpenAI Whisper model
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
eSpeak NG is an open source speech synthesizer that supports more than hundred languages and accents.
Translate the video from one language to another and add dubbing. 将视频从一种语言翻译为另一种语言,并添加配音
Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key
High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.
🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time
EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
Speech-to-text, text-to-speech, and speaker recongition using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 servers,...
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
This repository contains a hand-curated resources for Prompt Engineering with a focus on Generative Pre-trained Transformer (GPT), ChatGPT, PaLM etc
VITS-based Voice Conversion focused on simplicity, quality and performance.
An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io
Wip From NTTS Piper Ultra Fast And Efficient Text To Speech Library For Cross Platform Work on any edge device with cpu only
🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
Using the TextSoundSaver application, you can convert text into realistic synthesized speech. The app achieves smooth and natural text-to-speech conversion. In addition to providing excellent voice sy...
Easy-to-use speech toolset. Written in TypeScript. Includes tools for synthesis, recognition, alignment, speech translation, language detection, source separation and more.
From AI tools to TikTok video creation using FFMPEG, Microsoft Edge read aloud and OpenAI Whisper model
It is a multi-lingual (97 languages) text content automatic recognition and segmentation tool. 强大的TTS多语言(97种语言)混合文本内容自动分词工具。
Pandrator aspires to be a user-friendly app with a graphical interface and a one-click installer that creates high-quality speech from text in multiple languages (audiobooks, speech synchronised with ...
eSpeak NG is an open source speech synthesizer that supports more than hundred languages and accents.
Offline Speech Recognition with OpenAI Whisper and TensorFlow Lite for Android
Talking Head (3D): A JavaScript class for real-time lip-sync using Ready Player Me full-body 3D avatars.
Automatically Generate video based on given content!
Simple Python script to interact with the TikTok TTS Voices.
A web UI Project In order to learn the large language model. This project includes features such as chat, quantization, fine-tuning, prompt engineering templates, and multimodality.
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io
EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
Translate the video from one language to another and add dubbing. 将视频从一种语言翻译为另一种语言,并添加配音
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, ...
High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.
VITS-based Voice Conversion focused on simplicity, quality and performance.
Implementation of Voicebox, new SOTA Text-to-speech network from MetaAI, in Pytorch
[ICASSP 2024] 🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching
VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture Design
🎤 Lobe TTS - A high-quality & reliable TTS/STT library for Server and Browser
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io
EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
Translate the video from one language to another and add dubbing. 将视频从一种语言翻译为另一种语言,并添加配音
🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, ...
High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.
Foundational model for human-like, expressive TTS
Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
This repository contains a hand-curated resources for Prompt Engineering with a focus on Generative Pre-trained Transformer (GPT), ChatGPT, PaLM etc
eSpeak NG is an open source speech synthesizer that supports more than hundred languages and accents.
The official Python API for ElevenLabs Text to Speech.
:robot: :speech_balloon: Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)
EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, ...
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
[ICASSP 2024] 🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching
Implementation of Voicebox, new SOTA Text-to-speech network from MetaAI, in Pytorch
Speech Note Linux app. Note taking, reading and translating with offline Speech to Text, Text to Speech and Machine translation.
TTS Generation Web UI (Bark, MusicGen + AudioGen, Tortoise, RVC, Vocos, Demucs, SeamlessM4T, MAGNet, StyleTTS2, MMS)
Foundational model for human-like, expressive TTS
CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model
A ggml (C++) re-implementation of tortoise-tts. Under construction and seeking contributors.
Implementation of BEST-RQ - a model for self-supervised learning of speech signals using a random projection quantizer, in Pytorch.
Implementation of Spear-TTS - multi-speaker text-to-speech attention network, in Pytorch