Trending repositories for topic whisper
Nexa SDK is a comprehensive toolkit for supporting ONNX and GGML models. It supports text generation, image generation, vision-language models (VLM), auto-speech-recognition (ASR), and text-to-speech ...
Gradio WebUI for whisper, faster-whisper, whisper-timestamped. Supports YouTube Downloader, Vocal Remover, Transcription, Text-to-Speech (Edge-TTS, F5-TTS), and Translation.
Aura is like Siri, but in your browser. An AI voice assistant optimized for low latency responses.
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any...
Buzz transcribes and translates audio offline on your personal computer. Powered by OpenAI's Whisper.
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
The open-source iOS app that's making quality voice transcription more accessible on mobile devices.
Open source real-time translation app for Android that runs locally
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
Gradio WebUI for whisper, faster-whisper, whisper-timestamped. Supports YouTube Downloader, Vocal Remover, Transcription, Text-to-Speech (Edge-TTS, F5-TTS), and Translation.
Aura is like Siri, but in your browser. An AI voice assistant optimized for low latency responses.
Nexa SDK is a comprehensive toolkit for supporting ONNX and GGML models. It supports text generation, image generation, vision-language models (VLM), auto-speech-recognition (ASR), and text-to-speech ...
The open-source iOS app that's making quality voice transcription more accessible on mobile devices.
whisper-cpp-serve Real-time speech recognition and c+ of OpenAI's Whisper model in C/C++
Offline Speech Recognition with OpenAI Whisper and TensorFlow Lite for Android
Gp.nvim (GPT prompt) Neovim AI plugin: ChatGPT sessions & Instructable text/code operations & Speech to text [OpenAI, Ollama, Anthropic, ..]
Blazing fast whisper turbo for ASR (speech-to-text) tasks
A realtime speech transcription and translation application using Whisper OpenAI and free translation API. Interface made using Tkinter. Code written fully in Python.
This project is a digital human that can talk and listen to you. It uses OpenAI's GPT to generate responses, OpenAI's Whisper to transcript the audio, Eleven Labs to generate voice and Rhubarb Lip Syn...
批量为视频或者音频生成字幕,并可批量将字幕翻译成其它语言。这是一个客户端工具, 跨平台支持 mac 和 windows 系统, 支持百度,火山,deeplx, openai, deepseek, ollama 等多个翻译服务
Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any...
Generate accurate transcripts using Apple's MLX framework
Nexa SDK is a comprehensive toolkit for supporting ONNX and GGML models. It supports text generation, image generation, vision-language models (VLM), auto-speech-recognition (ASR), and text-to-speech ...
Gradio WebUI for whisper, faster-whisper, whisper-timestamped. Supports YouTube Downloader, Vocal Remover, Transcription, Text-to-Speech (Edge-TTS, F5-TTS), and Translation.
Aura is like Siri, but in your browser. An AI voice assistant optimized for low latency responses.
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Buzz transcribes and translates audio offline on your personal computer. Powered by OpenAI's Whisper.
Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any...
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
Open source real-time translation app for Android that runs locally
Transcribe any audio to text, translate and edit subtitles 100% locally with a web UI. Powered by whisper models!
Aura is like Siri, but in your browser. An AI voice assistant optimized for low latency responses.
Gradio WebUI for whisper, faster-whisper, whisper-timestamped. Supports YouTube Downloader, Vocal Remover, Transcription, Text-to-Speech (Edge-TTS, F5-TTS), and Translation.
Nexa SDK is a comprehensive toolkit for supporting ONNX and GGML models. It supports text generation, image generation, vision-language models (VLM), auto-speech-recognition (ASR), and text-to-speech ...
Blazing fast whisper turbo for ASR (speech-to-text) tasks
whisper-cpp-serve Real-time speech recognition and c+ of OpenAI's Whisper model in C/C++
使用faster-whisper本地模型提取音频,生成srt和ass字幕文件。支持gpt等在线翻译,生成翻译后字幕文件。(Use the faster-whisper local model to extract audio and generate srt and ass subtitle files. Support online translation such as gpt to gene...
Smart Whisper is a native Node.js addon designed for efficient and streamlined interaction with the whisper.cpp, with automatic model offloading and model manager.
An API to transcribe audio with OpenAI's Whisper Large v3!
批量为视频或者音频生成字幕,并可批量将字幕翻译成其它语言。这是一个客户端工具, 跨平台支持 mac 和 windows 系统, 支持百度,火山,deeplx, openai, deepseek, ollama 等多个翻译服务
The open-source iOS app that's making quality voice transcription more accessible on mobile devices.
Web application that converts audio and video to text using AI, supporting various formats and self-hosting.
This project is a digital human that can talk and listen to you. It uses OpenAI's GPT to generate responses, OpenAI's Whisper to transcript the audio, Eleven Labs to generate voice and Rhubarb Lip Syn...
Automatically subtitle any video spoken in any language to a language of your choice using AI.
Transcribe any audio or video file. Edit and view your transcripts in a standalone HTML editor.
Nexa SDK is a comprehensive toolkit for supporting ONNX and GGML models. It supports text generation, image generation, vision-language models (VLM), auto-speech-recognition (ASR), and text-to-speech ...
Gradio WebUI for whisper, faster-whisper, whisper-timestamped. Supports YouTube Downloader, Vocal Remover, Transcription, Text-to-Speech (Edge-TTS, F5-TTS), and Translation.
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any...
Buzz transcribes and translates audio offline on your personal computer. Powered by OpenAI's Whisper.
Aura is like Siri, but in your browser. An AI voice assistant optimized for low latency responses.
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
Open source real-time translation app for Android that runs locally
Generate accurate transcripts using Apple's MLX framework
Gradio WebUI for whisper, faster-whisper, whisper-timestamped. Supports YouTube Downloader, Vocal Remover, Transcription, Text-to-Speech (Edge-TTS, F5-TTS), and Translation.
Input text from speech in any Linux window, the lean, fast and accurate way, using whisper.cpp offline. Speak with local LLMs.
Nexa SDK is a comprehensive toolkit for supporting ONNX and GGML models. It supports text generation, image generation, vision-language models (VLM), auto-speech-recognition (ASR), and text-to-speech ...
Generate accurate transcripts using Apple's MLX framework
Aura is like Siri, but in your browser. An AI voice assistant optimized for low latency responses.
Blazing fast whisper turbo for ASR (speech-to-text) tasks
(Windows/Linux/MacOS) Local WebUI with neural network models (Text, Image, Video, 3D, Audio) on python (Gradio interface). Translated on 3 languages
Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection
whisper-cpp-serve Real-time speech recognition and c+ of OpenAI's Whisper model in C/C++
Web application that converts audio and video to text using AI, supporting various formats and self-hosting.
Smart Whisper is a native Node.js addon designed for efficient and streamlined interaction with the whisper.cpp, with automatic model offloading and model manager.
Generate Subtitles & Diarize Speakers in Davinci Resolve using AI.
批量为视频或者音频生成字幕,并可批量将字幕翻译成其它语言。这是一个客户端工具, 跨平台支持 mac 和 windows 系统, 支持百度,火山,deeplx, openai, deepseek, ollama 等多个翻译服务
Nexa SDK is a comprehensive toolkit for supporting ONNX and GGML models. It supports text generation, image generation, vision-language models (VLM), auto-speech-recognition (ASR), and text-to-speech ...
Gradio WebUI for whisper, faster-whisper, whisper-timestamped. Supports YouTube Downloader, Vocal Remover, Transcription, Text-to-Speech (Edge-TTS, F5-TTS), and Translation.
🍦 Speech-AI-Forge is a project developed around TTS generation model, implementing an API Server and a Gradio-based WebUI.
Aura is like Siri, but in your browser. An AI voice assistant optimized for low latency responses.
Local voice chatbot for engaging conversations, powered by Ollama, Hugging Face Transformers, and Coqui TTS Toolkit
ScribeWizard: Generate organized notes from audio using Groq, Whisper, and Llama3
批量为视频或者音频生成字幕,并可批量将字幕翻译成其它语言。这是一个客户端工具, 跨平台支持 mac 和 windows 系统, 支持百度,火山,deeplx, openai, deepseek, ollama 等多个翻译服务
Generate accurate transcripts using Apple's MLX framework
Open source real-time translation app for Android that runs locally
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Buzz transcribes and translates audio offline on your personal computer. Powered by OpenAI's Whisper.
Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any...
Nexa SDK is a comprehensive toolkit for supporting ONNX and GGML models. It supports text generation, image generation, vision-language models (VLM), auto-speech-recognition (ASR), and text-to-speech ...
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation a...
这是一个全自动(音频)视频翻译项目。利用Whisper识别声音,AI大模型翻译字幕,最后合并字幕视频,生成翻译后的视频。
Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.
Transcribe any audio to text, translate and edit subtitles 100% locally with a web UI. Powered by whisper models!
Instant, controllable, local pre-trained AI models in Rust
🍦 Speech-AI-Forge is a project developed around TTS generation model, implementing an API Server and a Gradio-based WebUI.
turnkey self-hosted offline transcription and diarization service with llm summary
ScribeWizard: Generate organized notes from audio using Groq, Whisper, and Llama3
Open source real-time translation app for Android that runs locally
Generate accurate transcripts using Apple's MLX framework
Generate Subtitles & Diarize Speakers in Davinci Resolve using AI.
An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Engine
An API to transcribe audio with OpenAI's Whisper Large v3!
🎤📄 An innovative tool that transforms audio or video files into text transcripts and generates concise meeting minutes. Stay organized and efficient in your meetings, and get ready for Phase 2 where...
Offline Speech Recognition with OpenAI Whisper and TensorFlow Lite for Android
Gradio-powered application that converts audio recordings of meetings into transcripts and provides concise summaries using whisper.
Modern Desktop Application offering a suite of tools for audio/video text recognition and a variety of other useful utilities.