Trending repositories for topic whisper
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any...
Buzz transcribes and translates audio offline on your personal computer. Powered by OpenAI's Whisper.
Comprehensive Gradio WebUI for audio processing, powered by Whisper engines (Whisper, Faster-Whisper, Whisper-Timestamped). Features Voice Changer, zero-shot Voice Cloning (E2, F5-TTS), YouTube downlo...
Nexa SDK is a comprehensive toolkit for supporting GGML and ONNX models. It supports text generation, image generation, vision-language models (VLM), Audio Language Model, auto-speech-recognition (ASR...
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation a...
Open source real-time translation app for Android that runs locally
批量为视频或者音频生成字幕,并可批量将字幕翻译成其它语言。这是一个客户端工具, 跨平台支持 mac 和 windows 系统, 支持百度,火山,deeplx, openai, deepseek, ollama 等多个翻译服务
Whisper & Faster-Whisper standalone executables for those who don't want to bother with Python.
Transcribe any audio to text, translate and edit subtitles 100% locally with a web UI. Powered by whisper models!
🍦 Speech-AI-Forge is a project developed around TTS generation model, implementing an API Server and a Gradio-based WebUI.
An open-source project for Windows developers to learn how to add AI with local models and APIs to Windows apps.
批量为视频或者音频生成字幕,并可批量将字幕翻译成其它语言。这是一个客户端工具, 跨平台支持 mac 和 windows 系统, 支持百度,火山,deeplx, openai, deepseek, ollama 等多个翻译服务
Transcribe any audio or video file. Edit and view your transcripts in a standalone HTML editor.
Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection
AI-powered tool for real-time interview question transcription and response generation.
Comprehensive Gradio WebUI for audio processing, powered by Whisper engines (Whisper, Faster-Whisper, Whisper-Timestamped). Features Voice Changer, zero-shot Voice Cloning (E2, F5-TTS), YouTube downlo...
turnkey self-hosted offline transcription and diarization service with llm summary
🍦 Speech-AI-Forge is a project developed around TTS generation model, implementing an API Server and a Gradio-based WebUI.
Offline Speech Recognition with OpenAI Whisper and TensorFlow Lite for Android
Tero Subtitler is an open source, cross-platform, and free subtitle editing software.
Whisper & Faster-Whisper standalone executables for those who don't want to bother with Python.
Aura is like Siri, but in your browser. An AI voice assistant optimized for low latency responses.
Blazing fast whisper turbo for ASR (speech-to-text) tasks
Nexa SDK is a comprehensive toolkit for supporting GGML and ONNX models. It supports text generation, image generation, vision-language models (VLM), Audio Language Model, auto-speech-recognition (ASR...
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any...
Buzz transcribes and translates audio offline on your personal computer. Powered by OpenAI's Whisper.
Comprehensive Gradio WebUI for audio processing, powered by Whisper engines (Whisper, Faster-Whisper, Whisper-Timestamped). Features Voice Changer, zero-shot Voice Cloning (E2, F5-TTS), YouTube downlo...
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation a...
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
Open source real-time translation app for Android that runs locally
An open-source project for Windows developers to learn how to add AI with local models and APIs to Windows apps.
Whisper & Faster-Whisper standalone executables for those who don't want to bother with Python.
Transcribe any audio to text, translate and edit subtitles 100% locally with a web UI. Powered by whisper models!
An open-source project for Windows developers to learn how to add AI with local models and APIs to Windows apps.
批量为视频或者音频生成字幕,并可批量将字幕翻译成其它语言。这是一个客户端工具, 跨平台支持 mac 和 windows 系统, 支持百度,火山,deeplx, openai, deepseek, ollama 等多个翻译服务
Transcribe any audio or video file. Edit and view your transcripts in a standalone HTML editor.
Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection
Smart Whisper is a native Node.js addon designed for efficient and streamlined interaction with the whisper.cpp, with automatic model offloading and model manager.
Blazing fast whisper turbo for ASR (speech-to-text) tasks
AI-powered tool for real-time interview question transcription and response generation.
Nexa SDK is a comprehensive toolkit for supporting GGML and ONNX models. It supports text generation, image generation, vision-language models (VLM), Audio Language Model, auto-speech-recognition (ASR...
💬📝 A small dictation app using OpenAI's Whisper speech recognition model.
An Android keyboard that performs speech-to-text (STT/ASR) with OpenAI Whisper and input the recognized text; Supports English, Chinese, Japanese, etc. and even mixed languages.
Comprehensive Gradio WebUI for audio processing, powered by Whisper engines (Whisper, Faster-Whisper, Whisper-Timestamped). Features Voice Changer, zero-shot Voice Cloning (E2, F5-TTS), YouTube downlo...
whisper-cpp-serve Real-time speech recognition and c+ of OpenAI's Whisper model in C/C++
Comprehensive Gradio WebUI for audio processing, powered by Whisper engines (Whisper, Faster-Whisper, Whisper-Timestamped). Features Voice Changer, zero-shot Voice Cloning (E2, F5-TTS), YouTube downlo...
Nexa SDK is a comprehensive toolkit for supporting GGML and ONNX models. It supports text generation, image generation, vision-language models (VLM), Audio Language Model, auto-speech-recognition (ASR...
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Aura is like Siri, but in your browser. An AI voice assistant optimized for low latency responses.
Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any...
Buzz transcribes and translates audio offline on your personal computer. Powered by OpenAI's Whisper.
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection
An open-source project for Windows developers to learn how to add AI with local models and APIs to Windows apps.
Open source real-time translation app for Android that runs locally
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation a...
An open-source project for Windows developers to learn how to add AI with local models and APIs to Windows apps.
Comprehensive Gradio WebUI for audio processing, powered by Whisper engines (Whisper, Faster-Whisper, Whisper-Timestamped). Features Voice Changer, zero-shot Voice Cloning (E2, F5-TTS), YouTube downlo...
Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection
Aura is like Siri, but in your browser. An AI voice assistant optimized for low latency responses.
A real-time, instant dictation desktop application built on Electron that uses Whisper and GROQ under the hood
视频转图文并茂的pdf—videotopdf:打工人(会议记录)和学生党(网课笔记)等必备!使用地址:https://zjrwtxtechstudio-video-to-pdf.hf.space
Nexa SDK is a comprehensive toolkit for supporting GGML and ONNX models. It supports text generation, image generation, vision-language models (VLM), Audio Language Model, auto-speech-recognition (ASR...
批量为视频或者音频生成字幕,并可批量将字幕翻译成其它语言。这是一个客户端工具, 跨平台支持 mac 和 windows 系统, 支持百度,火山,deeplx, openai, deepseek, ollama 等多个翻译服务
An Android keyboard that performs speech-to-text (STT/ASR) with OpenAI Whisper and input the recognized text; Supports English, Chinese, Japanese, etc. and even mixed languages.
Transcribe any audio or video file. Edit and view your transcripts in a standalone HTML editor.
The definitive, open-source Swift framework for interfacing with generative AI.
Generate Subtitles & Diarize Speakers in Davinci Resolve using AI.
Offline Speech Recognition with OpenAI Whisper and TensorFlow Lite for Android
Nexa SDK is a comprehensive toolkit for supporting GGML and ONNX models. It supports text generation, image generation, vision-language models (VLM), Audio Language Model, auto-speech-recognition (ASR...
Comprehensive Gradio WebUI for audio processing, powered by Whisper engines (Whisper, Faster-Whisper, Whisper-Timestamped). Features Voice Changer, zero-shot Voice Cloning (E2, F5-TTS), YouTube downlo...
🍦 Speech-AI-Forge is a project developed around TTS generation model, implementing an API Server and a Gradio-based WebUI.
Local voice chatbot for engaging conversations, powered by Ollama, Hugging Face Transformers, and Coqui TTS Toolkit
批量为视频或者音频生成字幕,并可批量将字幕翻译成其它语言。这是一个客户端工具, 跨平台支持 mac 和 windows 系统, 支持百度,火山,deeplx, openai, deepseek, ollama 等多个翻译服务
Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection
ScribeWizard: Generate organized notes from audio using Groq, Whisper, and Llama3
Generate accurate transcripts using Apple's MLX framework
AI Device Template Featuring Whisper, TTS, Groq, Llama3, OpenAI and more
An open-source project for Windows developers to learn how to add AI with local models and APIs to Windows apps.
Open source real-time translation app for Android that runs locally
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Nexa SDK is a comprehensive toolkit for supporting GGML and ONNX models. It supports text generation, image generation, vision-language models (VLM), Audio Language Model, auto-speech-recognition (ASR...
Buzz transcribes and translates audio offline on your personal computer. Powered by OpenAI's Whisper.
Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any...
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
Comprehensive Gradio WebUI for audio processing, powered by Whisper engines (Whisper, Faster-Whisper, Whisper-Timestamped). Features Voice Changer, zero-shot Voice Cloning (E2, F5-TTS), YouTube downlo...
这是一个全自动(音频)视频翻译项目。利用Whisper识别声音,AI大模型翻译字幕,最后合并字幕视频,生成翻译后的视频。
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation a...
Transcribe any audio to text, translate and edit subtitles 100% locally with a web UI. Powered by whisper models!
Aura is like Siri, but in your browser. An AI voice assistant optimized for low latency responses.
🍦 Speech-AI-Forge is a project developed around TTS generation model, implementing an API Server and a Gradio-based WebUI.
ScribeWizard: Generate organized notes from audio using Groq, Whisper, and Llama3
turnkey self-hosted offline transcription and diarization service with llm summary
Open source real-time translation app for Android that runs locally
Generate accurate transcripts using Apple's MLX framework
AI Inference Operator for Kubernetes. The easiest way to serve ML models in production. Supports LLMs, embeddings, and speech-to-text.
Aura is like Siri, but in your browser. An AI voice assistant optimized for low latency responses.
Generate Subtitles & Diarize Speakers in Davinci Resolve using AI.
An API to transcribe audio with OpenAI's Whisper Large v3!
Gradio-powered application that converts audio recordings of meetings into transcripts and provides concise summaries using whisper.
AI-short-creator is an AI-powered tool that turns long videos into short clips. It works best for videos with multiple speakers and topics, such as interviews and documentaries. AI-short-creator finds...
A real-time, instant dictation desktop application built on Electron that uses Whisper and GROQ under the hood
An open-source project for Windows developers to learn how to add AI with local models and APIs to Windows apps.
An Android keyboard that performs speech-to-text (STT/ASR) with OpenAI Whisper and input the recognized text; Supports English, Chinese, Japanese, etc. and even mixed languages.