Statistics for topic asr
RepositoryStats tracks 631,397 Github repositories, of these 236 are tagged with the asr topic. The most common primary language for repositories using this topic is Python (138). Other languages include: C++ (16), Jupyter Notebook (15)
Stargazers over time for topic asr
Most starred repositories for topic asr (view more)
Trending repositories for topic asr (view more)
TEN Agent is a conversational voice AI agent powered by TEN, integrating Deepseek, Gemini, OpenAI, RTC, and hardware like ESP32. It enables realtime AI capabilities like seeing, hearing, and speaking...
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
TEN Agent is a conversational voice AI agent powered by TEN, integrating Deepseek, Gemini, OpenAI, RTC, and hardware like ESP32. It enables realtime AI capabilities like seeing, hearing, and speaking...
This is an on-CPU real-time conversational system for two-way speech communication with AI models, utilizing a continuous streaming architecture for fluid conversations with immediate responses and na...
百聆 是一个类似GPT-4o的语音对话机器人,通过ASR+LLM+TTS实现,集成DeepSeek R1等优秀大模型,时延低至800ms,Mac等低配置也可运行,支持打断
A text-to-speech and speech-to-text server compatible with the OpenAI API, supporting Whisper, FunASR, Bark, and CosyVoice backends.
TEN Agent is a conversational voice AI agent powered by TEN, integrating Deepseek, Gemini, OpenAI, RTC, and hardware like ESP32. It enables realtime AI capabilities like seeing, hearing, and speaking...
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Speech-to-text, text-to-speech, speaker diarization, speech enhancement, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, HarmonyOS, R...
OpenAI-Compatible Proxy Middleware for the Wyoming Protocol
This is an on-CPU real-time conversational system for two-way speech communication with AI models, utilizing a continuous streaming architecture for fluid conversations with immediate responses and na...
呼叫中心,智能外呼,大模型呼入机器人,大模型呼出机器人,客服系统,工单系统,开源呼叫中心系统,话务系统,智能外呼系统,智能电话外呼,呼叫中心系统,大模型客服,电话外呼,客服中心,在线客服,大模型呼叫中心,呼入机器人,大模型机器人,智能电话外呼,开源呼叫中心系统,电话外呼,在线客服,大模型callcenter,contactcenter,Call,IPCC,Customer Service,Voic...
A text-to-speech and speech-to-text server compatible with the OpenAI API, supporting Whisper, FunASR, Bark, and CosyVoice backends.
LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation
TEN Agent is a conversational voice AI agent powered by TEN, integrating Deepseek, Gemini, OpenAI, RTC, and hardware like ESP32. It enables realtime AI capabilities like seeing, hearing, and speaking...
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Speech-to-text, text-to-speech, speaker diarization, speech enhancement, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, HarmonyOS, R...
百聆 是一个类似GPT-4o的语音对话机器人,通过ASR+LLM+TTS实现,集成DeepSeek R1等优秀大模型,时延低至800ms,Mac等低配置也可运行,支持打断
OpenAI-Compatible Proxy Middleware for the Wyoming Protocol
This is an on-CPU real-time conversational system for two-way speech communication with AI models, utilizing a continuous streaming architecture for fluid conversations with immediate responses and na...
A text-to-speech and speech-to-text server compatible with the OpenAI API, supporting Whisper, FunASR, Bark, and CosyVoice backends.
TEN Agent is a conversational voice AI agent powered by TEN, integrating Deepseek, Gemini, OpenAI, RTC, and hardware like ESP32. It enables realtime AI capabilities like seeing, hearing, and speaking...
Nexa SDK is a comprehensive toolkit for supporting GGML and ONNX models. It supports text generation, image generation, vision-language models (VLM), Audio Language Model, auto-speech-recognition (ASR...
Streamer-Sales 销冠 —— 卖货主播 LLM 大模型🛒🎁,一个能够根据给定的商品特点从激发用户购买意愿角度出发进行商品解说的卖货主播大模型。🚀⭐内含详细的数据生成流程❗ 📦另外还集成了 LMDeploy 加速推理🚀、RAG检索增强生成 📚、TTS文字转语音🔊、数字人生成 🦸、 Agent 使用网络查询实时信息🌐、ASR 语音转文字🎙️、Vue 生态搭建前端🍍、F...
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
TEN Agent is a conversational voice AI agent powered by TEN, integrating Deepseek, Gemini, OpenAI, RTC, and hardware like ESP32. It enables realtime AI capabilities like seeing, hearing, and speaking...
Speech-to-text, text-to-speech, speaker diarization, speech enhancement, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, HarmonyOS, R...
Nexa SDK is a comprehensive toolkit for supporting GGML and ONNX models. It supports text generation, image generation, vision-language models (VLM), Audio Language Model, auto-speech-recognition (ASR...
TEN Agent is a conversational voice AI agent powered by TEN, integrating Deepseek, Gemini, OpenAI, RTC, and hardware like ESP32. It enables realtime AI capabilities like seeing, hearing, and speaking...
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
🍦 Speech-AI-Forge is a project developed around TTS generation model, implementing an API Server and a Gradio-based WebUI.
Open-source industrial-grade ASR models supporting Mandarin, Chinese dialects and English, achieving a new SOTA on public Mandarin ASR benchmarks, while also offering outstanding singing lyrics recogn...
The media player for language learning, with dual subtitles, AI-generated subtitles, realtime-OCR, translation, word lookup, and more!