Statistics for topic speech-to-text
RepositoryStats tracks 579,129 Github repositories, of these 348 are tagged with the speech-to-text topic. The most common primary language for repositories using this topic is Python (165). Other languages include: JavaScript (33), TypeScript (25), Jupyter Notebook (21), C# (14), C++ (14)
Stargazers over time for topic speech-to-text
Most starred repositories for topic speech-to-text (view more)
Trending repositories for topic speech-to-text (view more)
Translate the video from one language to another and add dubbing. 将视频从一种语言翻译为另一种语言,并支持api调用
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Input text from speech in any Linux window, the lean, fast and accurate way, using whisper.cpp offline. Speak with local LLMs.
An Android keyboard that performs speech-to-text (STT/ASR) with OpenAI Whisper and input the recognized text; Supports English, Chinese, Japanese, etc. and even mixed languages.
whisper-cpp-serve Real-time speech recognition and c+ of OpenAI's Whisper model in C/C++
Demonstrates Voice Recognition, Text to Speech, Language Translation, OAuth2, Image Generation, Face Detection and Voice Chatbot. Source code and Documentation for my 2023 ADUG Symposium Talk.
OpenAI Whisper API-style local server, runnig on FastAPI
一个完全本地运行的开源语音转文本 API,该项目基于 OpenAI Whisper 模型以及推理更快的 Faster Whisper 模型,并实现了一个异步的模型池,利用 FastAPI 的异步特性进行高效包装,支持线程安全的异步任务队列,异步文件IO,异步数据库IO,异步网络爬虫模块,以及更多自定义功能。
Translate the video from one language to another and add dubbing. 将视频从一种语言翻译为另一种语言,并支持api调用
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
一个完全本地运行的开源语音转文本 API,该项目基于 OpenAI Whisper 模型以及推理更快的 Faster Whisper 模型,并实现了一个异步的模型池,利用 FastAPI 的异步特性进行高效包装,支持线程安全的异步任务队列,异步文件IO,异步数据库IO,异步网络爬虫模块,以及更多自定义功能。
Input text from speech in any Linux window, the lean, fast and accurate way, using whisper.cpp offline. Speak with local LLMs.
An Android keyboard that performs speech-to-text (STT/ASR) with OpenAI Whisper and input the recognized text; Supports English, Chinese, Japanese, etc. and even mixed languages.
whisper-cpp-serve Real-time speech recognition and c+ of OpenAI's Whisper model in C/C++
Demonstrates Voice Recognition, Text to Speech, Language Translation, OAuth2, Image Generation, Face Detection and Voice Chatbot. Source code and Documentation for my 2023 ADUG Symposium Talk.
一个完全本地运行的开源语音转文本 API,该项目基于 OpenAI Whisper 模型以及推理更快的 Faster Whisper 模型,并实现了一个异步的模型池,利用 FastAPI 的异步特性进行高效包装,支持线程安全的异步任务队列,异步文件IO,异步数据库IO,异步网络爬虫模块,以及更多自定义功能。
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Translate the video from one language to another and add dubbing. 将视频从一种语言翻译为另一种语言,并支持api调用
Blazing fast whisper turbo for ASR (speech-to-text) tasks
Gradio WebUI for whisper, faster-whisper, whisper-timestamped. Supports YouTube Downloader, Vocal Remover, Transcription, Text-to-Speech, and Translation.
Input text from speech in any Linux window, the lean, fast and accurate way, using whisper.cpp offline. Speak with local LLMs.
Speech To Speech: an effort for an open-sourced and modular GPT4-o
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
Voice Recognition to Text Tool / 一个离线运行的本地音视频转字幕工具,输出json、srt字幕、纯文字格式
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
Translate the video from one language to another and add dubbing. 将视频从一种语言翻译为另一种语言,并支持api调用
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
Voice Recognition to Text Tool / 一个离线运行的本地音视频转字幕工具,输出json、srt字幕、纯文字格式
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
Private and on-device speech recognition keyboard and service for Android.
Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.