Statistics for topic speech-to-text
RepositoryStats tracks 633,140 Github repositories, of these 389 are tagged with the speech-to-text topic. The most common primary language for repositories using this topic is Python (189). Other languages include: JavaScript (36), TypeScript (30), Jupyter Notebook (24), C# (17), C++ (14)
Stargazers over time for topic speech-to-text
Most starred repositories for topic speech-to-text (view more)
Trending repositories for topic speech-to-text (view more)
Speech-to-text, text-to-speech, speaker diarization, speech enhancement, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, HarmonyOS, R...
Discover the world of artificial intelligence and interact with your favorite characters without needing to learn tons of information. Bring your Waifu to life with Soul of Waifu!
State-of-the-art offline voice typing everywhere + txt terminals (Linux or WFL sesson on Windows.) with a simple bash script. Usable with X. Does not require X.
Real-time Voice Activity Detection (VAD) with some example use case like simple voice bot and live transcription (realtime transcription)
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
A text-to-speech (TTS) and Speech-to-Speech (STS) library built on Apple's MLX framework, providing efficient speech synthesis on Apple Silicon.
Discover the world of artificial intelligence and interact with your favorite characters without needing to learn tons of information. Bring your Waifu to life with Soul of Waifu!
LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
A text-to-speech (TTS) and Speech-to-Speech (STS) library built on Apple's MLX framework, providing efficient speech synthesis on Apple Silicon.
Discover the world of artificial intelligence and interact with your favorite characters without needing to learn tons of information. Bring your Waifu to life with Soul of Waifu!
Blueprint by Mozilla.ai for finetuning a Speech-To-Text model in your own language
Speech To Speech: an effort for an open-sourced and modular GPT4-o
Gradio WebUI for creators and developers, featuring key TTS (Edge-TTS, kokoro) and zero-shot Voice Cloning (E2 & F5-TTS, CosyVoice), with Whisper audio processing, YouTube download, Demucs vocal isola...
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
Translate the video from one language to another and add dubbing. 将视频从一种语言翻译为另一种语言,同时支持语音识别转录、语音合成、字幕翻译。
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
A robust, efficient, low-latency speech-to-text library with advanced voice activity detection, wake word activation and instant transcription.
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
Private and on-device speech recognition keyboard and service for Android.
VideoAlchemy is a toolkit expanding video processing capabilities, emphasizing FFmpeg and broader video technology applications.