Trending repositories for topic speech-recognition
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Gradio WebUI for whisper, faster-whisper, whisper-timestamped. Supports YouTube Downloader, Vocal Remover, Transcription, Text-to-Speech (Edge-TTS, F5-TTS), and Translation.
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation a...
Voice Recognition to Text Tool / 一个离线运行的本地音视频转字幕工具,输出json、srt字幕、纯文字格式
OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
High-quality and streaming Speech-to-Speech interactive agent in a single file. 只用一个文件实现的流式全双工语音交互原型智能体!
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.
Gradio WebUI for whisper, faster-whisper, whisper-timestamped. Supports YouTube Downloader, Vocal Remover, Transcription, Text-to-Speech (Edge-TTS, F5-TTS), and Translation.
High-quality and streaming Speech-to-Speech interactive agent in a single file. 只用一个文件实现的流式全双工语音交互原型智能体!
Arduinobot is an open-source 3D printed robot arm powered by ROS 2. Its simple design and low cost make it an excellent learning tool, featured in the "Robotics and ROS 2 - Learn by Doing! Manipulator...
Blazing fast whisper turbo for ASR (speech-to-text) tasks
⚡ 一款用于自动语音识别 (ASR)、翻译的高性能异步 API。不需要购买Whisper API,使用本地运行的Whisper模型进行推理,并支持多GPU并发,针对分布式部署进行设计。还内置了包括TikTok、抖音等社交媒体平台的爬虫,可实现来自多个社交平台的无缝媒体处理,为媒体内容数据自动化处理提供了强大且可扩展的解决方案。
Dataset of ICASSP 2021 MULTILINGUAL PHONETIC DATASET FOR LOW RESOURCE SPEECH RECOGNITION
Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection
MooER: Moore-threads Open Omni model for speech-to-speech intERaction. MooER-omni includes a series of end-to-end speech interaction models along with training and inference code, covering but not lim...
NodeJS Bindings for Whisper - the CPU version of OpenAI's Whisper, as initially crafted in C++ by ggerganov.
Custom nodes that extend the capabilities of Comfyui
🎤📄 An innovative tool that transforms audio or video files into text transcripts and generates concise meeting minutes. Stay organized and efficient in your meetings, and get ready for Phase 2 where...
Voice Recognition to Text Tool / 一个离线运行的本地音视频转字幕工具,输出json、srt字幕、纯文字格式
WhisperClip simplifies your life by automatically transcribing audio recordings and saving the text directly to your clipboard. With just a click of a button, you can effortlessly convert spoken words...
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
Official JavaScript SDK for Deepgram's automated speech recognition APIs.
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Gradio WebUI for whisper, faster-whisper, whisper-timestamped. Supports YouTube Downloader, Vocal Remover, Transcription, Text-to-Speech (Edge-TTS, F5-TTS), and Translation.
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.
OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference
Voice Recognition to Text Tool / 一个离线运行的本地音视频转字幕工具,输出json、srt字幕、纯文字格式
Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation a...
High-quality and streaming Speech-to-Speech interactive agent in a single file. 只用一个文件实现的流式全双工语音交互原型智能体!
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
Whisper & Faster-Whisper standalone executables for those who don't want to bother with Python.
Gradio WebUI for whisper, faster-whisper, whisper-timestamped. Supports YouTube Downloader, Vocal Remover, Transcription, Text-to-Speech (Edge-TTS, F5-TTS), and Translation.
V.I.S.O.R., my in-development AI-powered voice assistant with integrated memory!
⚡ 一款用于自动语音识别 (ASR)、翻译的高性能异步 API。不需要购买Whisper API,使用本地运行的Whisper模型进行推理,并支持多GPU并发,针对分布式部署进行设计。还内置了包括TikTok、抖音等社交媒体平台的爬虫,可实现来自多个社交平台的无缝媒体处理,为媒体内容数据自动化处理提供了强大且可扩展的解决方案。
High-quality and streaming Speech-to-Speech interactive agent in a single file. 只用一个文件实现的流式全双工语音交互原型智能体!
Input text from speech in any Linux window, the lean, fast and accurate way, using whisper.cpp offline. Speak with local LLMs.
WhisperClip simplifies your life by automatically transcribing audio recordings and saving the text directly to your clipboard. With just a click of a button, you can effortlessly convert spoken words...
Blazing fast whisper turbo for ASR (speech-to-text) tasks
Arduinobot is an open-source 3D printed robot arm powered by ROS 2. Its simple design and low cost make it an excellent learning tool, featured in the "Robotics and ROS 2 - Learn by Doing! Manipulator...
NodeJS Bindings for Whisper - the CPU version of OpenAI's Whisper, as initially crafted in C++ by ggerganov.
MooER: Moore-threads Open Omni model for speech-to-speech intERaction. MooER-omni includes a series of end-to-end speech interaction models along with training and inference code, covering but not lim...
Deep learning based speech and pronunciation assessment API for 8 languages.
State-of-the-art voice typing in Linux terminal (or WFL sesson on Windows.) with a simple bash script. Works with X. Does not require X.
Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection
BanglaSpeech2Text: An open-source offline speech-to-text package for Bangla language. Fine-tuned on the latest whisper speech to text model for optimal performance.
⚡ 一款用于自动语音识别 (ASR)、翻译的高性能异步 API。不需要购买Whisper API,使用本地运行的Whisper模型进行推理,并支持多GPU并发,针对分布式部署进行设计。还内置了包括TikTok、抖音等社交媒体平台的爬虫,可实现来自多个社交平台的无缝媒体处理,为媒体内容数据自动化处理提供了强大且可扩展的解决方案。
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Gradio WebUI for whisper, faster-whisper, whisper-timestamped. Supports YouTube Downloader, Vocal Remover, Transcription, Text-to-Speech (Edge-TTS, F5-TTS), and Translation.
Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.
Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference
High-quality and streaming Speech-to-Speech interactive agent in a single file. 只用一个文件实现的流式全双工语音交互原型智能体!
Voice Recognition to Text Tool / 一个离线运行的本地音视频转字幕工具,输出json、srt字幕、纯文字格式
DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation a...
High-quality and streaming Speech-to-Speech interactive agent in a single file. 只用一个文件实现的流式全双工语音交互原型智能体!
Blazing fast whisper turbo for ASR (speech-to-text) tasks
Gradio WebUI for whisper, faster-whisper, whisper-timestamped. Supports YouTube Downloader, Vocal Remover, Transcription, Text-to-Speech (Edge-TTS, F5-TTS), and Translation.
Input text from speech in any Linux window, the lean, fast and accurate way, using whisper.cpp offline. Speak with local LLMs.
V.I.S.O.R., my in-development AI-powered voice assistant with integrated memory!
About 3D Printed robot arm powered by ROS 2 and Arduino and controlled via MoveIt! 2 and Amazon Alexa. It is developed and programmed in the online course named "Robotics and ROS 2 - Learn by Doing! M...
An Android keyboard that performs speech-to-text (STT/ASR) with OpenAI Whisper and input the recognized text; Supports English, Chinese, Japanese, etc. and even mixed languages.
Command line speech recognition and transcription for macOS
MooER: Moore-threads Open Omni model for speech-to-speech intERaction. MooER-omni includes a series of end-to-end speech interaction models along with training and inference code, covering but not lim...
Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection
Private and on-device speech recognition keyboard and service for Android.
A talking LLM that runs on your own computer without needing the internet.
Real-time Voice Activity Detection (VAD) with some example use case like simple voice bot and live transcription (realtime transcription)
Offline Speech Recognition with OpenAI Whisper and TensorFlow Lite for Android
Speech-to-Text based on SileroVAD + whisper.cpp (GGML Whisper) for ROS 2
whisper-cpp-serve Real-time speech recognition and c+ of OpenAI's Whisper model in C/C++
Voice Recognition to Text Tool / 一个离线运行的本地音视频转字幕工具,输出json、srt字幕、纯文字格式
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
Near-Realtime audio transcription using self-hosted Whisper and WebSocket in Python/JS
Local voice chatbot for engaging conversations, powered by Ollama, Hugging Face Transformers, and Coqui TTS Toolkit
Gradio WebUI for whisper, faster-whisper, whisper-timestamped. Supports YouTube Downloader, Vocal Remover, Transcription, Text-to-Speech (Edge-TTS, F5-TTS), and Translation.
Private and on-device speech recognition keyboard and service for Android.
An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Engine
A talking LLM that runs on your own computer without needing the internet.
Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection
High-quality and streaming Speech-to-Speech interactive agent in a single file. 只用一个文件实现的流式全双工语音交互原型智能体!
🔊 Create labeled datasets, enhance audio quality, identify speakers, support diverse dataset types. 🎧👥📊 Advanced audio processing.
MooER: Moore-threads Open Omni model for speech-to-speech intERaction. MooER-omni includes a series of end-to-end speech interaction models along with training and inference code, covering but not lim...
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
Voice Recognition to Text Tool / 一个离线运行的本地音视频转字幕工具,输出json、srt字幕、纯文字格式
OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation a...
DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.
这是一个全自动(音频)视频翻译项目。利用Whisper识别声音,AI大模型翻译字幕,最后合并字幕视频,生成翻译后的视频。
Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.
Transcribe any audio to text, translate and edit subtitles 100% locally with a web UI. Powered by whisper models!
Voice Recognition to Text Tool / 一个离线运行的本地音视频转字幕工具,输出json、srt字幕、纯文字格式
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
Private and on-device speech recognition keyboard and service for Android.
ChatGPT at home! Basically a better Google Nest Hub or Amazon Alexa home assistant. Built on the Raspberry Pi using the OpenAI API.
AI-WEBUI: A universal web interface for AI creation, 一款好用的图像、音频、视频AI处理工具
Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.
Near-Realtime audio transcription using self-hosted Whisper and WebSocket in Python/JS
An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Engine
Build high-performance AI models with modular building blocks
Offline Speech Recognition with OpenAI Whisper and TensorFlow Lite for Android
ASR 2Pass onnxruntime and websocket server, based on FunASR(https://github.com/alibaba-damo-academy/FunASR).
Custom nodes that extend the capabilities of Comfyui
A desktop application that transcribes audio from files, microphone input or YouTube videos with the option to translate the content and create subtitles.
About 3D Printed robot arm powered by ROS 2 and Arduino and controlled via MoveIt! 2 and Amazon Alexa. It is developed and programmed in the online course named "Robotics and ROS 2 - Learn by Doing! M...
Dockerfile for WhisperX: Automatic Speech Recognition with Word-Level Timestamps and Speaker Diarization (Dockerfile, CI image build and test)