Statistics for topic speech
RepositoryStats tracks 632,869 Github repositories, of these 324 are tagged with the speech topic. The most common primary language for repositories using this topic is Python (191). Other languages include: Jupyter Notebook (30), JavaScript (15), TypeScript (11)
Stargazers over time for topic speech
Most starred repositories for topic speech (view more)
Trending repositories for topic speech (view more)
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
ModelScope: bring the notion of Model-as-a-Service to life.
Sylber: Syllabic Embedding Representation of Speech from Raw Audio
LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation
Free, high-quality text-to-speech API endpoint to replace OpenAI, Azure, or ElevenLabs
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time
Sylber: Syllabic Embedding Representation of Speech from Raw Audio
Free, high quality text-to-speech for your Obsidian notes, leveraging Microsoft Edge's Read Aloud API.
LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation
Free, high-quality text-to-speech API endpoint to replace OpenAI, Azure, or ElevenLabs
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
A curated list of Turkish AI models, datasets, papers
Sylber: Syllabic Embedding Representation of Speech from Raw Audio
OpenOmni: Official implementation of Advancing Open-Source Omnimodal Large Language Models with Progressive Multimodal Alignment and Real-Time Self-Aware Emotional Speech Synthesis
Free, high quality text-to-speech for your Obsidian notes, leveraging Microsoft Edge's Read Aloud API.
Speech To Speech: an effort for an open-sourced and modular GPT4-o
An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Speech To Speech: an effort for an open-sourced and modular GPT4-o
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM