Statistics for topic speech
RepositoryStats tracks 518,325 Github repositories, of these 266 are tagged with the speech topic. The most common primary language for repositories using this topic is Python (149). Other languages include: Jupyter Notebook (28), JavaScript (15)
Stargazers over time for topic speech
Most starred repositories for topic speech (view more)
Trending repositories for topic speech (view more)
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Grounded-SAM: Marrying Grounding-DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
Voice Recognition to Text Tool / 一个离线运行的本地语音识别转文字服务,输出json、srt字幕带时间戳、纯文字格式
The Official PyTorch Implementation of FN-SSL & IPDnet for Sound Source Localization
The official repo of NBC & SpatialNet for multichannel speech separation, denoising, and dereverberation
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
VITS-based Voice Conversion focused on simplicity, quality and performance.
Simple Python script to interact with the TikTok TTS Voices.
Easy-to-use speech toolset. Written in TypeScript. Includes tools for synthesis, recognition, alignment, speech translation, language detection, source separation and more.
Optimized local inference for LLMs with HuggingFace-like APIs for quantization, vision/language models, multimodal agents, speech, vector DB, and RAG.
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Grounded-SAM: Marrying Grounding-DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time
Joint speech-language model - respond directly to audio!
Optimized local inference for LLMs with HuggingFace-like APIs for quantization, vision/language models, multimodal agents, speech, vector DB, and RAG.
Easy-to-use speech toolset. Written in TypeScript. Includes tools for synthesis, recognition, alignment, speech translation, language detection, source separation and more.
Simple Python script to interact with the TikTok TTS Voices.
EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
Voice Recognition to Text Tool / 一个离线运行的本地语音识别转文字服务,输出json、srt字幕带时间戳、纯文字格式
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
Grounded-SAM: Marrying Grounding-DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
Voice Recognition to Text Tool / 一个离线运行的本地语音识别转文字服务,输出json、srt字幕带时间戳、纯文字格式
Fully customizable AI chatbot component for your website
Foundational model for human-like, expressive TTS
Easy-to-use speech toolset. Written in TypeScript. Includes tools for synthesis, recognition, alignment, speech translation, language detection, source separation and more.