Trending repositories for topic speech-recognition
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
turnkey self-hosted offline transcription and diarization service with llm summary
Voice Recognition to Text Tool / 一个离线运行的本地语音识别转文字服务,输出json、srt字幕带时间戳、纯文字格式
DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
ChatGPT at home! Basically a better Google Nest Hub or Amazon Alexa home assistant. Built on the Raspberry Pi using the OpenAI API.
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation a...
State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.
Whisper & Faster-Whisper standalone executables for those who don't want to bother with Python.
Transcribe any audio to text, translate and edit subtitles 100% locally with a web UI. Powered by whisper models!
turnkey self-hosted offline transcription and diarization service with llm summary
ChatGPT at home! Basically a better Google Nest Hub or Amazon Alexa home assistant. Built on the Raspberry Pi using the OpenAI API.
Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.
💬📝 A small dictation app using OpenAI's Whisper speech recognition model.
The ChatGPT Voice Assistant uses a Raspberry Pi (or desktop) to enable spoken conversation with OpenAI large language models. This implementation listens to speech, processes the conversation through ...
A desktop application that transcribes audio from files, microphone input or YouTube videos with the option to translate the content and create subtitles.
A talking LLM that runs on your own computer without needing the internet.
Voice Recognition to Text Tool / 一个离线运行的本地语音识别转文字服务,输出json、srt字幕带时间戳、纯文字格式
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Offline Speech Recognition with OpenAI Whisper and TensorFlow Lite for Android
Code and Pretrained Models for Interspeech 2023 Paper "Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong Audio Event Taggers"
Whisper & Faster-Whisper standalone executables for those who don't want to bother with Python.
Transcribe any audio to text, translate and edit subtitles 100% locally with a web UI. Powered by whisper models!
Official Python SDK for Deepgram's automated speech recognition APIs.
OBS plugin for local speech recognition and captioning using AI
GPT-3 client for Windows and Unix with memories management that supports both text and speech in any language. Includes a free text2image
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
turnkey self-hosted offline transcription and diarization service with llm summary
Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Swift native on-device speech recognition with Whisper for Apple Silicon
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
Voice Recognition to Text Tool / 一个离线运行的本地语音识别转文字服务,输出json、srt字幕带时间戳、纯文字格式
DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
ChatGPT at home! Basically a better Google Nest Hub or Amazon Alexa home assistant. Built on the Raspberry Pi using the OpenAI API.
OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation a...
Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
turnkey self-hosted offline transcription and diarization service with llm summary
ChatGPT at home! Basically a better Google Nest Hub or Amazon Alexa home assistant. Built on the Raspberry Pi using the OpenAI API.
A talking LLM that runs on your own computer without needing the internet.
Private and on-device speech recognition keyboard and service for Android.
Automatically Generate video based on given content!
Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.
The ChatGPT Voice Assistant uses a Raspberry Pi (or desktop) to enable spoken conversation with OpenAI large language models. This implementation listens to speech, processes the conversation through ...
A desktop application that transcribes audio from files, microphone input or YouTube videos with the option to translate the content and create subtitles.
Offline Speech Recognition with OpenAI Whisper and TensorFlow Lite for Android
Chrome/Edge BROWSER EXTENSION that can RECOGNIZE any live audio/video streaming then TRANSLATE it for FREE (using unofficial online Google Translate API) then display it as LIVE CAPTION / LIVE SUBTITL...
💬📝 A small dictation app using OpenAI's Whisper speech recognition model.
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
Swift native on-device speech recognition with Whisper for Apple Silicon
Near-Realtime audio transcription using self-hosted Whisper and WebSocket in Python/JS
OBS plugin for local speech recognition and captioning using AI
Voice Recognition to Text Tool / 一个离线运行的本地语音识别转文字服务,输出json、srt字幕带时间戳、纯文字格式
VietGPT VoiceBot: Chatbot automatically recognizes Vietnamese voice and uses the ChatGPT API for natural language interaction.
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
turnkey self-hosted offline transcription and diarization service with llm summary
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
ChatGPT at home! Basically a better Google Nest Hub or Amazon Alexa home assistant. Built on the Raspberry Pi using the OpenAI API.
Swift native on-device speech recognition with Whisper for Apple Silicon
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
Voice Recognition to Text Tool / 一个离线运行的本地语音识别转文字服务,输出json、srt字幕带时间戳、纯文字格式
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation a...
OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference
Transcribe any audio to text, translate and edit subtitles 100% locally with a web UI. Powered by whisper models!
Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.
ChatGPT at home! Basically a better Google Nest Hub or Amazon Alexa home assistant. Built on the Raspberry Pi using the OpenAI API.
turnkey self-hosted offline transcription and diarization service with llm summary
Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.
VietGPT VoiceBot: Chatbot automatically recognizes Vietnamese voice and uses the ChatGPT API for natural language interaction.
A talking LLM that runs on your own computer without needing the internet.
Easy-to-use speech toolset. Written in TypeScript. Includes tools for synthesis, recognition, alignment, speech translation, language detection, source separation and more.
WhisperClip simplifies your life by automatically transcribing audio recordings and saving the text directly to your clipboard. With just a click of a button, you can effortlessly convert spoken words...
A desktop application that transcribes audio from files, microphone input or YouTube videos with the option to translate the content and create subtitles.
Offline Speech Recognition with OpenAI Whisper and TensorFlow Lite for Android
Real-time Voice Activity Detection (VAD) with some example use case like simple voice bot and live transcription (realtime transcription)
OBS plugin for local speech recognition and captioning using AI
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Automatically Generate video based on given content!
Transcribe any audio to text, translate and edit subtitles 100% locally with a web UI. Powered by whisper models!
Voice Recognition to Text Tool / 一个离线运行的本地语音识别转文字服务,输出json、srt字幕带时间戳、纯文字格式
Private and on-device speech recognition keyboard and service for Android.
Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.
Swift native on-device speech recognition with Whisper for Apple Silicon
Voice Recognition to Text Tool / 一个离线运行的本地语音识别转文字服务,输出json、srt字幕带时间戳、纯文字格式
Transcribe any audio to text, translate and edit subtitles 100% locally with a web UI. Powered by whisper models!
INTERSPEECH 2023 Papers: A complete collection of influential and exciting research papers from the INTERSPEECH 2023 conference. Explore the latest advances in speech and language processing. Code inc...
Amica is an open source interface for interactive communication with 3D characters with voice synthesis and speech recognition.
turnkey self-hosted offline transcription and diarization service with llm summary
Near-Realtime audio transcription using self-hosted Whisper and WebSocket in Python/JS
🎤 Lobe TTS - A high-quality & reliable TTS/STT library for Server and Browser
ChatGPT at home! Basically a better Google Nest Hub or Amazon Alexa home assistant. Built on the Raspberry Pi using the OpenAI API.
Code and Pretrained Models for Interspeech 2023 Paper "Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong Audio Event Taggers"
ICASSP 2023-2024 Papers: A complete collection of influential and exciting research papers from the ICASSP 2023-24 conferences. Explore the latest advancements in acoustics, speech and signal processi...
An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Engine
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation a...
DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
Swift native on-device speech recognition with Whisper for Apple Silicon
Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference
State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.
Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Voice Recognition to Text Tool / 一个离线运行的本地语音识别转文字服务,输出json、srt字幕带时间戳、纯文字格式
JAX implementation of OpenAI's Whisper model for up to 70x speed-up on TPU.
Voice Recognition to Text Tool / 一个离线运行的本地语音识别转文字服务,输出json、srt字幕带时间戳、纯文字格式
Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.
INTERSPEECH 2023 Papers: A complete collection of influential and exciting research papers from the INTERSPEECH 2023 conference. Explore the latest advances in speech and language processing. Code inc...
ICASSP 2023-2024 Papers: A complete collection of influential and exciting research papers from the ICASSP 2023-24 conferences. Explore the latest advancements in acoustics, speech and signal processi...
Speech Note Linux app. Note taking, reading and translating with offline Speech to Text, Text to Speech and Machine translation.
Swift native on-device speech recognition with Whisper for Apple Silicon
AI-WEBUI: A universal web interface for AI creation, 一款好用的图像、音频、视频AI处理工具
This tool uses AI to evaluate your pronunciation.
Near-Realtime audio transcription using self-hosted Whisper and WebSocket in Python/JS
Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.
Official implementation for Fast-HuBERT: An Efficient Training Framework for Self-Supervised Speech Representation Learning
An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Engine
Whisper & Faster-Whisper standalone executables for those who don't want to bother with Python.
Fine-tune the Whisper speech recognition model to support training without timestamp data, training with timestamp data, and training without speech data. Accelerate inference and support Web deployme...
A desktop application that transcribes audio from files, microphone input or YouTube videos with the option to translate the content and create subtitles.
Fine-tune and evaluate Whisper models for Automatic Speech Recognition (ASR) on custom datasets or datasets from huggingface.
Implementation of the paper "Self-supervised Learning with Random-projection Quantizer for Speech Recognition" in Pytorch.
Easy-to-use speech toolset. Written in TypeScript. Includes tools for synthesis, recognition, alignment, speech translation, language detection, source separation and more.