Trending repositories for topic asr
TEN Agent is a conversational AI powered by the TEN, integrating Gemini 2.0 Live, OpenAI Realtime, RTC, and more. It delivers real-time capabilities to see, hear, and speak, while being fully compatib...
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
⚡ 一款用于自动语音识别 (ASR)、翻译的高性能异步 API。不需要购买Whisper API,使用本地运行的Whisper模型进行推理,并支持多GPU并发,针对分布式部署进行设计。还内置了包括TikTok、抖音等社交媒体平台的爬虫,可实现来自多个社交平台的无缝媒体处理,为媒体内容数据自动化处理提供了强大且可扩展的解决方案。
Nexa SDK is a comprehensive toolkit for supporting GGML and ONNX models. It supports text generation, image generation, vision-language models (VLM), Audio Language Model, auto-speech-recognition (ASR...
Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation a...
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Whisper & Faster-Whisper standalone executables for those who don't want to bother with Python.
This is a python API which allows you to get the transcript/subtitles for a given YouTube video. It also works for automatically generated subtitles and it does not require an API key nor a headless b...
🍦 Speech-AI-Forge is a project developed around TTS generation model, implementing an API Server and a Gradio-based WebUI.
Speech-to-text, text-to-speech, speaker diarization, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, HarmonyOS, Raspberry Pi, RISC-V,...
实时语音交互数字人,支持端到端语音方案(GLM-4-Voice - THG)和级联方案(ASR-LLM-TTS-THG)。可自定义形象与音色,无须训练,支持音色克隆,首包延迟低至3s。Real-time voice interactive digital human, supporting end-to-end voice solutions (GLM-4-Voice - THG) and cas...
html5 js 录音 mp3 wav ogg webm amr g711a g711u 格式,支持pc和Android、iOS部分浏览器、Hybrid App(提供Android iOS App源码)、微信,提供ASR语音识别转文字 H5版语音通话聊天示例 DTMF编码解码
🤖 wukong-robot 是一个简单、灵活、优雅的中文语音对话机器人/智能音箱项目,支持ChatGPT多轮对话能力,还可能是首个支持脑机交互的开源智能音箱项目。
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
Streamer-Sales 销冠 —— 卖货主播 LLM 大模型🛒🎁,一个能够根据给定的商品特点从激发用户购买意愿角度出发进行商品解说的卖货主播大模型。🚀⭐内含详细的数据生成流程❗ 📦另外还集成了 LMDeploy 加速推理🚀、RAG检索增强生成 📚、TTS文字转语音🔊、数字人生成 🦸、 Agent 使用网络查询实时信息🌐、ASR 语音转文字🎙️、Vue 生态搭建前端🍍、F...
⚡ 一款用于自动语音识别 (ASR)、翻译的高性能异步 API。不需要购买Whisper API,使用本地运行的Whisper模型进行推理,并支持多GPU并发,针对分布式部署进行设计。还内置了包括TikTok、抖音等社交媒体平台的爬虫,可实现来自多个社交平台的无缝媒体处理,为媒体内容数据自动化处理提供了强大且可扩展的解决方案。
TEN Agent is a conversational AI powered by the TEN, integrating Gemini 2.0 Live, OpenAI Realtime, RTC, and more. It delivers real-time capabilities to see, hear, and speak, while being fully compatib...
💬 Transcribe, translate, diarize, annotate and subtitle video (and audio) with Whisper on Win, Linux and Mac ... Fast!!
实时语音交互数字人,支持端到端语音方案(GLM-4-Voice - THG)和级联方案(ASR-LLM-TTS-THG)。可自定义形象与音色,无须训练,支持音色克隆,首包延迟低至3s。Real-time voice interactive digital human, supporting end-to-end voice solutions (GLM-4-Voice - THG) and cas...
Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection
🍦 Speech-AI-Forge is a project developed around TTS generation model, implementing an API Server and a Gradio-based WebUI.
Whisper & Faster-Whisper standalone executables for those who don't want to bother with Python.
Offline Speech Recognition with OpenAI Whisper and TensorFlow Lite for Android
Nexa SDK is a comprehensive toolkit for supporting GGML and ONNX models. It supports text generation, image generation, vision-language models (VLM), Audio Language Model, auto-speech-recognition (ASR...
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
ICASSP 2023-2024 Papers: A complete collection of influential and exciting research papers from the ICASSP 2023-24 conferences. Explore the latest advancements in acoustics, speech and signal processi...
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Speech-to-text, text-to-speech, speaker diarization, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, HarmonyOS, Raspberry Pi, RISC-V,...
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
TEN Agent is a conversational AI powered by the TEN, integrating Gemini 2.0 Live, OpenAI Realtime, RTC, and more. It delivers real-time capabilities to see, hear, and speak, while being fully compatib...
Nexa SDK is a comprehensive toolkit for supporting GGML and ONNX models. It supports text generation, image generation, vision-language models (VLM), Audio Language Model, auto-speech-recognition (ASR...
⚡ 一款用于自动语音识别 (ASR)、翻译的高性能异步 API。不需要购买Whisper API,使用本地运行的Whisper模型进行推理,并支持多GPU并发,针对分布式部署进行设计。还内置了包括TikTok、抖音等社交媒体平台的爬虫,可实现来自多个社交平台的无缝媒体处理,为媒体内容数据自动化处理提供了强大且可扩展的解决方案。
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Speech-to-text, text-to-speech, speaker diarization, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, HarmonyOS, Raspberry Pi, RISC-V,...
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation a...
This is a python API which allows you to get the transcript/subtitles for a given YouTube video. It also works for automatically generated subtitles and it does not require an API key nor a headless b...
Whisper & Faster-Whisper standalone executables for those who don't want to bother with Python.
实时语音交互数字人,支持端到端语音方案(GLM-4-Voice - THG)和级联方案(ASR-LLM-TTS-THG)。可自定义形象与音色,无须训练,支持音色克隆,首包延迟低至3s。Real-time voice interactive digital human, supporting end-to-end voice solutions (GLM-4-Voice - THG) and cas...
🍦 Speech-AI-Forge is a project developed around TTS generation model, implementing an API Server and a Gradio-based WebUI.
Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection
🤖 wukong-robot 是一个简单、灵活、优雅的中文语音对话机器人/智能音箱项目,支持ChatGPT多轮对话能力,还可能是首个支持脑机交互的开源智能音箱项目。
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
⚡ 一款用于自动语音识别 (ASR)、翻译的高性能异步 API。不需要购买Whisper API,使用本地运行的Whisper模型进行推理,并支持多GPU并发,针对分布式部署进行设计。还内置了包括TikTok、抖音等社交媒体平台的爬虫,可实现来自多个社交平台的无缝媒体处理,为媒体内容数据自动化处理提供了强大且可扩展的解决方案。
TEN Agent is a conversational AI powered by the TEN, integrating Gemini 2.0 Live, OpenAI Realtime, RTC, and more. It delivers real-time capabilities to see, hear, and speak, while being fully compatib...
In this fast-paced world, we all need a little something to spice up life. Whether you need a glass of sweet talk to lift your spirits or a dose of sharp retorts to let off steam, FunGPT has got you c...
💬 Transcribe, translate, diarize, annotate and subtitle video (and audio) with Whisper on Win, Linux and Mac ... Fast!!
实时语音交互数字人,支持端到端语音方案(GLM-4-Voice - THG)和级联方案(ASR-LLM-TTS-THG)。可自定义形象与音色,无须训练,支持音色克隆,首包延迟低至3s。Real-time voice interactive digital human, supporting end-to-end voice solutions (GLM-4-Voice - THG) and cas...
Nexa SDK is a comprehensive toolkit for supporting GGML and ONNX models. It supports text generation, image generation, vision-language models (VLM), Audio Language Model, auto-speech-recognition (ASR...
Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection
Blazing fast whisper turbo for ASR (speech-to-text) tasks
whisper-cpp-serve Real-time speech recognition and c+ of OpenAI's Whisper model in C/C++
🍦 Speech-AI-Forge is a project developed around TTS generation model, implementing an API Server and a Gradio-based WebUI.
Whisper & Faster-Whisper standalone executables for those who don't want to bother with Python.
Speech-to-text, text-to-speech, speaker diarization, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, HarmonyOS, Raspberry Pi, RISC-V,...
Offline Speech Recognition with OpenAI Whisper and TensorFlow Lite for Android
TEN Agent is a conversational AI powered by the TEN, integrating Gemini 2.0 Live, OpenAI Realtime, RTC, and more. It delivers real-time capabilities to see, hear, and speak, while being fully compatib...
Nexa SDK is a comprehensive toolkit for supporting GGML and ONNX models. It supports text generation, image generation, vision-language models (VLM), Audio Language Model, auto-speech-recognition (ASR...
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Speech-to-text, text-to-speech, speaker diarization, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, HarmonyOS, Raspberry Pi, RISC-V,...
Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection
Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
⚡ 一款用于自动语音识别 (ASR)、翻译的高性能异步 API。不需要购买Whisper API,使用本地运行的Whisper模型进行推理,并支持多GPU并发,针对分布式部署进行设计。还内置了包括TikTok、抖音等社交媒体平台的爬虫,可实现来自多个社交平台的无缝媒体处理,为媒体内容数据自动化处理提供了强大且可扩展的解决方案。
实时语音交互数字人,支持端到端语音方案(GLM-4-Voice - THG)和级联方案(ASR-LLM-TTS-THG)。可自定义形象与音色,无须训练,支持音色克隆,首包延迟低至3s。Real-time voice interactive digital human, supporting end-to-end voice solutions (GLM-4-Voice - THG) and cas...
This is a python API which allows you to get the transcript/subtitles for a given YouTube video. It also works for automatically generated subtitles and it does not require an API key nor a headless b...
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation a...
Streamer-Sales 销冠 —— 卖货主播 LLM 大模型🛒🎁,一个能够根据给定的商品特点从激发用户购买意愿角度出发进行商品解说的卖货主播大模型。🚀⭐内含详细的数据生成流程❗ 📦另外还集成了 LMDeploy 加速推理🚀、RAG检索增强生成 📚、TTS文字转语音🔊、数字人生成 🦸、 Agent 使用网络查询实时信息🌐、ASR 语音转文字🎙️、Vue 生态搭建前端🍍、F...
Whisper & Faster-Whisper standalone executables for those who don't want to bother with Python.
🤖 wukong-robot 是一个简单、灵活、优雅的中文语音对话机器人/智能音箱项目,支持ChatGPT多轮对话能力,还可能是首个支持脑机交互的开源智能音箱项目。
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
In this fast-paced world, we all need a little something to spice up life. Whether you need a glass of sweet talk to lift your spirits or a dose of sharp retorts to let off steam, FunGPT has got you c...
⚡ 一款用于自动语音识别 (ASR)、翻译的高性能异步 API。不需要购买Whisper API,使用本地运行的Whisper模型进行推理,并支持多GPU并发,针对分布式部署进行设计。还内置了包括TikTok、抖音等社交媒体平台的爬虫,可实现来自多个社交平台的无缝媒体处理,为媒体内容数据自动化处理提供了强大且可扩展的解决方案。
TEN Agent is a conversational AI powered by the TEN, integrating Gemini 2.0 Live, OpenAI Realtime, RTC, and more. It delivers real-time capabilities to see, hear, and speak, while being fully compatib...
Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection
💬 Transcribe, translate, diarize, annotate and subtitle video (and audio) with Whisper on Win, Linux and Mac ... Fast!!
Nexa SDK is a comprehensive toolkit for supporting GGML and ONNX models. It supports text generation, image generation, vision-language models (VLM), Audio Language Model, auto-speech-recognition (ASR...
实时语音交互数字人,支持端到端语音方案(GLM-4-Voice - THG)和级联方案(ASR-LLM-TTS-THG)。可自定义形象与音色,无须训练,支持音色克隆,首包延迟低至3s。Real-time voice interactive digital human, supporting end-to-end voice solutions (GLM-4-Voice - THG) and cas...
Gradio WebUI for whisper, faster-whisper, whisper-timestamped. Supports YouTube Downloader, Vocal Remover and Transcription.
Blazing fast whisper turbo for ASR (speech-to-text) tasks
Offline Speech Recognition with OpenAI Whisper and TensorFlow Lite for Android
whisper-cpp-serve Real-time speech recognition and c+ of OpenAI's Whisper model in C/C++
This UI serves as a Synthetic ASR Dataset Generator powered by/for OpenAI Whisper, enabling users to capture audio, transcribing it, on the fly and manage the generated dataset 🤗. Fine tune Whisper ...
🍦 Speech-AI-Forge is a project developed around TTS generation model, implementing an API Server and a Gradio-based WebUI.
Speech-to-text, text-to-speech, speaker diarization, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, HarmonyOS, Raspberry Pi, RISC-V,...
Nexa SDK is a comprehensive toolkit for supporting GGML and ONNX models. It supports text generation, image generation, vision-language models (VLM), Audio Language Model, auto-speech-recognition (ASR...
TEN Agent is a conversational AI powered by the TEN, integrating Gemini 2.0 Live, OpenAI Realtime, RTC, and more. It delivers real-time capabilities to see, hear, and speak, while being fully compatib...
Streamer-Sales 销冠 —— 卖货主播 LLM 大模型🛒🎁,一个能够根据给定的商品特点从激发用户购买意愿角度出发进行商品解说的卖货主播大模型。🚀⭐内含详细的数据生成流程❗ 📦另外还集成了 LMDeploy 加速推理🚀、RAG检索增强生成 📚、TTS文字转语音🔊、数字人生成 🦸、 Agent 使用网络查询实时信息🌐、ASR 语音转文字🎙️、Vue 生态搭建前端🍍、F...
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
🍦 Speech-AI-Forge is a project developed around TTS generation model, implementing an API Server and a Gradio-based WebUI.
实时语音交互数字人,支持端到端语音方案(GLM-4-Voice - THG)和级联方案(ASR-LLM-TTS-THG)。可自定义形象与音色,无须训练,支持音色克隆,首包延迟低至3s。Real-time voice interactive digital human, supporting end-to-end voice solutions (GLM-4-Voice - THG) and cas...
Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection
Daily tracking of awesome audio papers, including music generation, zero-shot tts, asr, audio generation
⚡ 一款用于自动语音识别 (ASR)、翻译的高性能异步 API。不需要购买Whisper API,使用本地运行的Whisper模型进行推理,并支持多GPU并发,针对分布式部署进行设计。还内置了包括TikTok、抖音等社交媒体平台的爬虫,可实现来自多个社交平台的无缝媒体处理,为媒体内容数据自动化处理提供了强大且可扩展的解决方案。
Open source subtitling platform 💻 for transcribing and translating videos/audios in Indic languages.
Gnome shell extension for accurate speech to text input in Linux using whisper.cpp. Input text from speech anywhere.
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Nexa SDK is a comprehensive toolkit for supporting GGML and ONNX models. It supports text generation, image generation, vision-language models (VLM), Audio Language Model, auto-speech-recognition (ASR...
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
TEN Agent is a conversational AI powered by the TEN, integrating Gemini 2.0 Live, OpenAI Realtime, RTC, and more. It delivers real-time capabilities to see, hear, and speak, while being fully compatib...
Speech-to-text, text-to-speech, speaker diarization, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, HarmonyOS, Raspberry Pi, RISC-V,...
Streamer-Sales 销冠 —— 卖货主播 LLM 大模型🛒🎁,一个能够根据给定的商品特点从激发用户购买意愿角度出发进行商品解说的卖货主播大模型。🚀⭐内含详细的数据生成流程❗ 📦另外还集成了 LMDeploy 加速推理🚀、RAG检索增强生成 📚、TTS文字转语音🔊、数字人生成 🦸、 Agent 使用网络查询实时信息🌐、ASR 语音转文字🎙️、Vue 生态搭建前端🍍、F...
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation a...
Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
This is a python API which allows you to get the transcript/subtitles for a given YouTube video. It also works for automatically generated subtitles and it does not require an API key nor a headless b...
🤖 wukong-robot 是一个简单、灵活、优雅的中文语音对话机器人/智能音箱项目,支持ChatGPT多轮对话能力,还可能是首个支持脑机交互的开源智能音箱项目。
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
Whisper & Faster-Whisper standalone executables for those who don't want to bother with Python.
🍦 Speech-AI-Forge is a project developed around TTS generation model, implementing an API Server and a Gradio-based WebUI.
TEN Agent is a conversational AI powered by the TEN, integrating Gemini 2.0 Live, OpenAI Realtime, RTC, and more. It delivers real-time capabilities to see, hear, and speak, while being fully compatib...
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
🍦 Speech-AI-Forge is a project developed around TTS generation model, implementing an API Server and a Gradio-based WebUI.
Praises is a text-to-speech tool that can help you read text easily.
Speech-to-text, text-to-speech, speaker diarization, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, HarmonyOS, Raspberry Pi, RISC-V,...
Offline Speech Recognition with OpenAI Whisper and TensorFlow Lite for Android
An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Engine
whisper-cpp-serve Real-time speech recognition and c+ of OpenAI's Whisper model in C/C++
Gradio WebUI for whisper, faster-whisper, whisper-timestamped. Supports YouTube Downloader, Vocal Remover and Transcription.
Chitralekha - A video transcreation platform for Indic languages, supporting transcription, translation and voice-over
Dockerfile for WhisperX: Automatic Speech Recognition with Word-Level Timestamps and Speaker Diarization (Dockerfile, CI image build and test)
Blazing fast whisper turbo for ASR (speech-to-text) tasks
Speech Note Linux app. Note taking, reading and translating with offline Speech to Text, Text to Speech and Machine translation.
:white_check_mark: A list of speech recognition learning resources including courses, books, tutorials, papers and toolkits.
ICASSP 2023-2024 Papers: A complete collection of influential and exciting research papers from the ICASSP 2023-24 conferences. Explore the latest advancements in acoustics, speech and signal processi...