Trending repositories for topic speech-to-text
Gradio WebUI for whisper, faster-whisper, whisper-timestamped. Supports YouTube Downloader, Vocal Remover, Transcription, Text-to-Speech (Edge-TTS, F5-TTS), and Translation.
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Translate the video from one language to another and add dubbing. 将视频从一种语言翻译为另一种语言,同时支持语音识别转录、语音合成、字幕翻译。
Speech-to-text, text-to-speech, speaker diarization, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 ser...
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
Voice Recognition to Text Tool / 一个离线运行的本地音视频转字幕工具,输出json、srt字幕、纯文字格式
A robust, efficient, low-latency speech-to-text library with advanced voice activity detection, wake word activation and instant transcription.
Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
The open-source iOS app that's making quality voice transcription more accessible on mobile devices.
Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
Transcribe any audio to text, translate and edit subtitles 100% locally with a web UI. Powered by whisper models!
Gradio WebUI for whisper, faster-whisper, whisper-timestamped. Supports YouTube Downloader, Vocal Remover, Transcription, Text-to-Speech (Edge-TTS, F5-TTS), and Translation.
⚡ 一款用于自动语音识别 (ASR)、翻译的高性能异步 API。不需要购买Whisper API,使用本地运行的Whisper模型进行推理,并支持多GPU并发,针对分布式部署进行设计。还内置了包括TikTok、抖音等社交媒体平台的爬虫,可实现来自多个社交平台的无缝媒体处理,为媒体内容数据自动化处理提供了强大且可扩展的解决方案。
The open-source iOS app that's making quality voice transcription more accessible on mobile devices.
whisper-cpp-serve Real-time speech recognition and c+ of OpenAI's Whisper model in C/C++
Private and on-device speech recognition keyboard and service for Android.
Gp.nvim (GPT prompt) Neovim AI plugin: ChatGPT sessions & Instructable text/code operations & Speech to text [OpenAI, Ollama, Anthropic, ..]
ChatGPT 安卓版 - 私人定制 AI,只需要本地设置 API Key 就可以使用,聊天历史本地存储,如果想体验语音版本可以下载商用版,或是 自己集成 Azure Speech SDK(付费,现有免费额度送)。
Blazing fast whisper turbo for ASR (speech-to-text) tasks
A robust, efficient, low-latency speech-to-text library with advanced voice activity detection, wake word activation and instant transcription.
Speech-to-text, text-to-speech, speaker diarization, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 ser...
Voice Recognition to Text Tool / 一个离线运行的本地音视频转字幕工具,输出json、srt字幕、纯文字格式
Live transcription in Next.js by Deepgram
Whisper.net. Speech to text made simple using Whisper Models
MooER: Moore-threads Open Omni model for speech-to-speech intERaction. MooER-omni includes a series of end-to-end speech interaction models along with training and inference code, covering but not lim...
Gradio WebUI for whisper, faster-whisper, whisper-timestamped. Supports YouTube Downloader, Vocal Remover, Transcription, Text-to-Speech (Edge-TTS, F5-TTS), and Translation.
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Translate the video from one language to another and add dubbing. 将视频从一种语言翻译为另一种语言,同时支持语音识别转录、语音合成、字幕翻译。
Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.
Speech-to-text, text-to-speech, speaker diarization, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 ser...
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
A robust, efficient, low-latency speech-to-text library with advanced voice activity detection, wake word activation and instant transcription.
Voice Recognition to Text Tool / 一个离线运行的本地音视频转字幕工具,输出json、srt字幕、纯文字格式
Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Transcribe any audio to text, translate and edit subtitles 100% locally with a web UI. Powered by whisper models!
Speech To Speech: an effort for an open-sourced and modular GPT4-o
Whisper & Faster-Whisper standalone executables for those who don't want to bother with Python.
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
Gradio WebUI for whisper, faster-whisper, whisper-timestamped. Supports YouTube Downloader, Vocal Remover, Transcription, Text-to-Speech (Edge-TTS, F5-TTS), and Translation.
⚡ 一款用于自动语音识别 (ASR)、翻译的高性能异步 API。不需要购买Whisper API,使用本地运行的Whisper模型进行推理,并支持多GPU并发,针对分布式部署进行设计。还内置了包括TikTok、抖音等社交媒体平台的爬虫,可实现来自多个社交平台的无缝媒体处理,为媒体内容数据自动化处理提供了强大且可扩展的解决方案。
Blazing fast whisper turbo for ASR (speech-to-text) tasks
Private and on-device speech recognition keyboard and service for Android.
whisper-cpp-serve Real-time speech recognition and c+ of OpenAI's Whisper model in C/C++
An API to transcribe audio with OpenAI's Whisper Large v3!
Realtime Interview Copilot is a web application that assists users in crafting responses during interviews. It leverages real-time audio transcription and AI-powered response generation to provide rel...
The open-source iOS app that's making quality voice transcription more accessible on mobile devices.
ChatGPT 安卓版 - 私人定制 AI,只需要本地设置 API Key 就可以使用,聊天历史本地存储,如果想体验语音版本可以下载商用版,或是 自己集成 Azure Speech SDK(付费,现有免费额度送)。
MooER: Moore-threads Open Omni model for speech-to-speech intERaction. MooER-omni includes a series of end-to-end speech interaction models along with training and inference code, covering but not lim...
A robust, efficient, low-latency speech-to-text library with advanced voice activity detection, wake word activation and instant transcription.
Transcribe any audio to text, translate and edit subtitles 100% locally with a web UI. Powered by whisper models!
If you've ever had the wish to talk to your AI Waifu using quality characters and voices for character voicing, then I suggest Soul of Waifu. Don't miss the opportunity to touch your dream!
Whisper & Faster-Whisper standalone executables for those who don't want to bother with Python.
⚡ 一款用于自动语音识别 (ASR)、翻译的高性能异步 API。不需要购买Whisper API,使用本地运行的Whisper模型进行推理,并支持多GPU并发,针对分布式部署进行设计。还内置了包括TikTok、抖音等社交媒体平台的爬虫,可实现来自多个社交平台的无缝媒体处理,为媒体内容数据自动化处理提供了强大且可扩展的解决方案。
Gradio WebUI for whisper, faster-whisper, whisper-timestamped. Supports YouTube Downloader, Vocal Remover, Transcription, Text-to-Speech (Edge-TTS, F5-TTS), and Translation.
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Translate the video from one language to another and add dubbing. 将视频从一种语言翻译为另一种语言,同时支持语音识别转录、语音合成、字幕翻译。
Speech To Speech: an effort for an open-sourced and modular GPT4-o
Speech-to-text, text-to-speech, speaker diarization, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 ser...
Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.
A robust, efficient, low-latency speech-to-text library with advanced voice activity detection, wake word activation and instant transcription.
Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
Voice Recognition to Text Tool / 一个离线运行的本地音视频转字幕工具,输出json、srt字幕、纯文字格式
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
⚡ 一款用于自动语音识别 (ASR)、翻译的高性能异步 API。不需要购买Whisper API,使用本地运行的Whisper模型进行推理,并支持多GPU并发,针对分布式部署进行设计。还内置了包括TikTok、抖音等社交媒体平台的爬虫,可实现来自多个社交平台的无缝媒体处理,为媒体内容数据自动化处理提供了强大且可扩展的解决方案。
Transcribe any audio to text, translate and edit subtitles 100% locally with a web UI. Powered by whisper models!
Gradio WebUI for whisper, faster-whisper, whisper-timestamped. Supports YouTube Downloader, Vocal Remover, Transcription, Text-to-Speech (Edge-TTS, F5-TTS), and Translation.
Input text from speech in any Linux window, the lean, fast and accurate way, using whisper.cpp offline. Speak with local LLMs.
Blazing fast whisper turbo for ASR (speech-to-text) tasks
Realtime Interview Copilot is a web application that assists users in crafting responses during interviews. It leverages real-time audio transcription and AI-powered response generation to provide rel...
whisper-cpp-serve Real-time speech recognition and c+ of OpenAI's Whisper model in C/C++
Tool for automatic transcription and speaker diarization based on whisper and pyannote.
MooER: Moore-threads Open Omni model for speech-to-speech intERaction. MooER-omni includes a series of end-to-end speech interaction models along with training and inference code, covering but not lim...
Generate Subtitles & Diarize Speakers in Davinci Resolve using AI.
An Android keyboard that performs speech-to-text (STT/ASR) with OpenAI Whisper and input the recognized text; Supports English, Chinese, Japanese, etc. and even mixed languages.
Wyoming protocol server for Microsoft Azure speech-to-text
CleanStream is an OBS plugin that uses AI to clean live audio streams from unwanted words and utterances
Real-time Voice Activity Detection (VAD) with some example use case like simple voice bot and live transcription (realtime transcription)
A robust, efficient, low-latency speech-to-text library with advanced voice activity detection, wake word activation and instant transcription.
Private and on-device speech recognition keyboard and service for Android.
OBS plugin for local speech recognition and captioning using AI
Speech To Speech: an effort for an open-sourced and modular GPT4-o
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
Voice Recognition to Text Tool / 一个离线运行的本地音视频转字幕工具,输出json、srt字幕、纯文字格式
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
Gradio WebUI for whisper, faster-whisper, whisper-timestamped. Supports YouTube Downloader, Vocal Remover, Transcription, Text-to-Speech (Edge-TTS, F5-TTS), and Translation.
Near-Realtime audio transcription using self-hosted Whisper and WebSocket in Python/JS
Local voice chatbot for engaging conversations, powered by Ollama, Hugging Face Transformers, and Coqui TTS Toolkit
Private and on-device speech recognition keyboard and service for Android.
An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Engine
🔊 Create labeled datasets, enhance audio quality, identify speakers, support diverse dataset types. 🎧👥📊 Advanced audio processing.
An API to transcribe audio with OpenAI's Whisper Large v3!
MooER: Moore-threads Open Omni model for speech-to-speech intERaction. MooER-omni includes a series of end-to-end speech interaction models along with training and inference code, covering but not lim...
WhisperClip simplifies your life by automatically transcribing audio recordings and saving the text directly to your clipboard. With just a click of a button, you can effortlessly convert spoken words...
⚡ 一款用于自动语音识别 (ASR)、翻译的高性能异步 API。不需要购买Whisper API,使用本地运行的Whisper模型进行推理,并支持多GPU并发,针对分布式部署进行设计。还内置了包括TikTok、抖音等社交媒体平台的爬虫,可实现来自多个社交平台的无缝媒体处理,为媒体内容数据自动化处理提供了强大且可扩展的解决方案。
ChatPlus is a progressive web app developped with React, NodeJS, Firebase and other services
Open source subtitling platform 💻 for transcribing and translating videos/audios in Indic languages.
AskTube - An AI-powered YouTube video summarizer and QA assistant powered by Retrieval Augmented Generation (RAG) 🤖. Run it entirely on your local machine with Ollama, or cloud-based models like Clau...
Translate the video from one language to another and add dubbing. 将视频从一种语言翻译为另一种语言,同时支持语音识别转录、语音合成、字幕翻译。
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.
Speech To Speech: an effort for an open-sourced and modular GPT4-o
Speech-to-text, text-to-speech, speaker diarization, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 ser...
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
Voice Recognition to Text Tool / 一个离线运行的本地音视频转字幕工具,输出json、srt字幕、纯文字格式
DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
A robust, efficient, low-latency speech-to-text library with advanced voice activity detection, wake word activation and instant transcription.
Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Transcribe any audio to text, translate and edit subtitles 100% locally with a web UI. Powered by whisper models!
an editor for spoken-word audio with automatic transcription
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
Voice Recognition to Text Tool / 一个离线运行的本地音视频转字幕工具,输出json、srt字幕、纯文字格式
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
Private and on-device speech recognition keyboard and service for Android.
🎤 Lobe TTS - A high-quality & reliable TTS/STT library for Server and Browser
Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.
Near-Realtime audio transcription using self-hosted Whisper and WebSocket in Python/JS
Generate Subtitles & Diarize Speakers in Davinci Resolve using AI.
An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Engine
An API to transcribe audio with OpenAI's Whisper Large v3!
🎤📄 An innovative tool that transforms audio or video files into text transcripts and generates concise meeting minutes. Stay organized and efficient in your meetings, and get ready for Phase 2 where...
Gradio-powered application that converts audio recordings of meetings into transcripts and provides concise summaries using whisper.
Speech-to-text, text-to-speech, speaker diarization, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 ser...
A desktop application that uses AI to translate voice between languages in real time, while preserving the speaker's tone and emotion.
Desktop application for Linux and Windows that utilizes distil-whisper models from HuggingFace, to enable real-time offline speech-to-text dictation.
Custom nodes that extend the capabilities of Comfyui
A web UI Project In order to learn the large language model. This project includes features such as chat, quantization, fine-tuning, prompt engineering templates, and multimodality.
A desktop application that transcribes audio from files, microphone input or YouTube videos with the option to translate the content and create subtitles.