Trending repositories for topic speech-to-text
Translate the video from one language to another and add dubbing. 将视频从一种语言翻译为另一种语言,并添加配音
Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
An API to transcribe audio with OpenAI's Whisper Large v3!
Voice Recognition to Text Tool / 一个离线运行的本地语音识别转文字服务,输出json、srt字幕带时间戳、纯文字格式
DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
Speech-to-text, text-to-speech, and speaker recongition using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 servers,...
A robust, efficient, low-latency speech-to-text library with advanced voice activity detection, wake word activation and instant transcription.
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
Whisper & Faster-Whisper standalone executables for those who don't want to bother with Python.
Transcribe any audio to text, translate and edit subtitles 100% locally with a web UI. Powered by whisper models!
Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple
A Deep-Learning-Based Chinese Speech Recognition System 基于深度学习的中文语音识别系统
An API to transcribe audio with OpenAI's Whisper Large v3!
Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.
💬📝 A small dictation app using OpenAI's Whisper speech recognition model.
The ChatGPT Voice Assistant uses a Raspberry Pi (or desktop) to enable spoken conversation with OpenAI large language models. This implementation listens to speech, processes the conversation through ...
A desktop application that transcribes audio from files, microphone input or YouTube videos with the option to translate the content and create subtitles.
Voice Recognition to Text Tool / 一个离线运行的本地语音识别转文字服务,输出json、srt字幕带时间戳、纯文字格式
Speech-to-text, text-to-speech, and speaker recongition using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 servers,...
A robust, efficient, low-latency speech-to-text library with advanced voice activity detection, wake word activation and instant transcription.
Whisper & Faster-Whisper standalone executables for those who don't want to bother with Python.
Translate the video from one language to another and add dubbing. 将视频从一种语言翻译为另一种语言,并添加配音
Transcribe any audio to text, translate and edit subtitles 100% locally with a web UI. Powered by whisper models!
A web UI Project In order to learn the large language model. This project includes features such as chat, quantization, fine-tuning, prompt engineering templates, and multimodality.
Transcribe and translate voice into LRC file using Whisper and LLMs (GPT, Claude, et,al). 使用whisper和LLM(GPT,Claude等)来转录、翻译你的音频为字幕文件。
OBS plugin for local speech recognition and captioning using AI
GPT-3 client for Windows and Unix with memories management that supports both text and speech in any language. Includes a free text2image
🎤 Lobe TTS - A high-quality & reliable TTS/STT library for Server and Browser
Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.
Translate the video from one language to another and add dubbing. 将视频从一种语言翻译为另一种语言,并添加配音
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
Voice Recognition to Text Tool / 一个离线运行的本地语音识别转文字服务,输出json、srt字幕带时间戳、纯文字格式
DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
An API to transcribe audio with OpenAI's Whisper Large v3!
A robust, efficient, low-latency speech-to-text library with advanced voice activity detection, wake word activation and instant transcription.
Speech-to-text, text-to-speech, and speaker recongition using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 servers,...
Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Whisper & Faster-Whisper standalone executables for those who don't want to bother with Python.
Transcribe any audio to text, translate and edit subtitles 100% locally with a web UI. Powered by whisper models!
An API to transcribe audio with OpenAI's Whisper Large v3!
Private and on-device speech recognition keyboard and service for Android.
Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.
Repository containing the open source code of works published at the FBK MT unit.
The ChatGPT Voice Assistant uses a Raspberry Pi (or desktop) to enable spoken conversation with OpenAI large language models. This implementation listens to speech, processes the conversation through ...
A desktop application that transcribes audio from files, microphone input or YouTube videos with the option to translate the content and create subtitles.
Chrome/Edge BROWSER EXTENSION that can RECOGNIZE any live audio/video streaming then TRANSLATE it for FREE (using unofficial online Google Translate API) then display it as LIVE CAPTION / LIVE SUBTITL...
💬📝 A small dictation app using OpenAI's Whisper speech recognition model.
A robust, efficient, low-latency speech-to-text library with advanced voice activity detection, wake word activation and instant transcription.
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
Near-Realtime audio transcription using self-hosted Whisper and WebSocket in Python/JS
Speech-to-text, text-to-speech, and speaker recongition using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 servers,...
OBS plugin for local speech recognition and captioning using AI
Voice Recognition to Text Tool / 一个离线运行的本地语音识别转文字服务,输出json、srt字幕带时间戳、纯文字格式
Whisper & Faster-Whisper standalone executables for those who don't want to bother with Python.
🎤 Lobe TTS - A high-quality & reliable TTS/STT library for Server and Browser
VietGPT VoiceBot: Chatbot automatically recognizes Vietnamese voice and uses the ChatGPT API for natural language interaction.
Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Translate the video from one language to another and add dubbing. 将视频从一种语言翻译为另一种语言,并添加配音
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
Voice Recognition to Text Tool / 一个离线运行的本地语音识别转文字服务,输出json、srt字幕带时间戳、纯文字格式
A robust, efficient, low-latency speech-to-text library with advanced voice activity detection, wake word activation and instant transcription.
Speech-to-text, text-to-speech, and speaker recongition using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 servers,...
Transcribe any audio to text, translate and edit subtitles 100% locally with a web UI. Powered by whisper models!
Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Whisper & Faster-Whisper standalone executables for those who don't want to bother with Python.
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.
VietGPT VoiceBot: Chatbot automatically recognizes Vietnamese voice and uses the ChatGPT API for natural language interaction.
An API to transcribe audio with OpenAI's Whisper Large v3!
Easy-to-use speech toolset. Written in TypeScript. Includes tools for synthesis, recognition, alignment, speech translation, language detection, source separation and more.
WhisperClip simplifies your life by automatically transcribing audio recordings and saving the text directly to your clipboard. With just a click of a button, you can effortlessly convert spoken words...
A desktop application that transcribes audio from files, microphone input or YouTube videos with the option to translate the content and create subtitles.
Generate subtitles using OpenAI Whisper in Davinci Resolve editing software.
Real-time Voice Activity Detection (VAD) with some example use case like simple voice bot and live transcription (realtime transcription)
A robust, efficient, low-latency speech-to-text library with advanced voice activity detection, wake word activation and instant transcription.
OBS plugin for local speech recognition and captioning using AI
The subtitles and translations are generated in real-time and displayed as pop-ups.
A web UI Project In order to learn the large language model. This project includes features such as chat, quantization, fine-tuning, prompt engineering templates, and multimodality.
Speech-to-text, text-to-speech, and speaker recongition using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 servers,...
Transcribe any audio to text, translate and edit subtitles 100% locally with a web UI. Powered by whisper models!
Voice Recognition to Text Tool / 一个离线运行的本地语音识别转文字服务,输出json、srt字幕带时间戳、纯文字格式
Private and on-device speech recognition keyboard and service for Android.
Translate the video from one language to another and add dubbing. 将视频从一种语言翻译为另一种语言,并添加配音
Voice Recognition to Text Tool / 一个离线运行的本地语音识别转文字服务,输出json、srt字幕带时间戳、纯文字格式
A robust, efficient, low-latency speech-to-text library with advanced voice activity detection, wake word activation and instant transcription.
Transcribe any audio to text, translate and edit subtitles 100% locally with a web UI. Powered by whisper models!
Gp.nvim (GPT prompt) Neovim AI plugin: ChatGPT sessions & Instructable text/code operations & Speech to text [OpenAI]
Near-Realtime audio transcription using self-hosted Whisper and WebSocket in Python/JS
Collection of resources on the applications of Large Language Models (LLMs) in Audio AI.
Transcribe and translate voice into LRC file using Whisper and LLMs (GPT, Claude, et,al). 使用whisper和LLM(GPT,Claude等)来转录、翻译你的音频为字幕文件。
🎤 Lobe TTS - A high-quality & reliable TTS/STT library for Server and Browser
FunCodec is a research-oriented toolkit for audio quantization and downstream applications, such as text-to-speech synthesis, music generation et.al.
An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Engine
A web UI Project In order to learn the large language model. This project includes features such as chat, quantization, fine-tuning, prompt engineering templates, and multimodality.
🔊 Create labeled datasets, enhance audio quality, identify speakers, support diverse dataset types. 🎧👥📊 Advanced audio processing.
An API to transcribe audio with OpenAI's Whisper Large v3!
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Translate the video from one language to another and add dubbing. 将视频从一种语言翻译为另一种语言,并添加配音
DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Voice Recognition to Text Tool / 一个离线运行的本地语音识别转文字服务,输出json、srt字幕带时间戳、纯文字格式
JAX implementation of OpenAI's Whisper model for up to 70x speed-up on TPU.
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
A robust, efficient, low-latency speech-to-text library with advanced voice activity detection, wake word activation and instant transcription.
Transcribe any audio to text, translate and edit subtitles 100% locally with a web UI. Powered by whisper models!
A Deep-Learning-Based Chinese Speech Recognition System 基于深度学习的中文语音识别系统
Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple
Speech recognition module for Python, supporting several engines and APIs, online and offline.
Voice Recognition to Text Tool / 一个离线运行的本地语音识别转文字服务,输出json、srt字幕带时间戳、纯文字格式
Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.
Transcribe and translate voice into LRC file using Whisper and LLMs (GPT, Claude, et,al). 使用whisper和LLM(GPT,Claude等)来转录、翻译你的音频为字幕文件。
Speech Note Linux app. Note taking, reading and translating with offline Speech to Text, Text to Speech and Machine translation.
Generate subtitles using OpenAI Whisper in Davinci Resolve editing software.
This tool uses AI to evaluate your pronunciation.
Near-Realtime audio transcription using self-hosted Whisper and WebSocket in Python/JS
An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Engine
Collection of resources on the applications of Large Language Models (LLMs) in Audio AI.
Whisper & Faster-Whisper standalone executables for those who don't want to bother with Python.
A desktop application that transcribes audio from files, microphone input or YouTube videos with the option to translate the content and create subtitles.
Easy-to-use speech toolset. Written in TypeScript. Includes tools for synthesis, recognition, alignment, speech translation, language detection, source separation and more.
An API to transcribe audio with OpenAI's Whisper Large v3!
Open source, local, and self-hosted highly optimized language inference server supporting ASR/STT, TTS, and LLM across WebRTC, REST, and WS
Private and on-device speech recognition keyboard and service for Android.