Trending repositories for topic speech
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
更适合新手的 AI 口语对话练习应用 / Beginner-friendly AI conversation practice application
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Voice Recognition to Text Tool / 一个离线运行的本地音视频转字幕工具,输出json、srt字幕、纯文字格式
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
ModelScope: bring the notion of Model-as-a-Service to life.
:robot: :speech_balloon: Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
A simple, high-quality voice conversion tool focused on ease of use and performance.
Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection
Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple
Cross-platform speech toolset, used from the command-line or as a Node.js library. Includes a variety of engines for speech synthesis, speech recognition, forced alignment, speech translation, voice i...
Python library and CLI tool to interface with Google Translate's text-to-speech API
更适合新手的 AI 口语对话练习应用 / Beginner-friendly AI conversation practice application
Free, high quality text-to-speech for your Obsidian notes, leveraging Microsoft Edge's Read Aloud API.
Cross-platform speech toolset, used from the command-line or as a Node.js library. Includes a variety of engines for speech synthesis, speech recognition, forced alignment, speech translation, voice i...
SCOREQ: Speech COntrastive REgression for Quality Assessment (NeurIPS 2024)
Text-to-speech API endpoint compatible with OpenAI's TTS API endpoint, using Microsoft Edge TTS to generate speech for free locally
Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection
Voice Recognition to Text Tool / 一个离线运行的本地音视频转字幕工具,输出json、srt字幕、纯文字格式
Python API & command-line tool to easily transcribe speech-based video files into clean text
A simple, high-quality voice conversion tool focused on ease of use and performance.
Controllable and fast Text-to-Speech for over 7000 languages!
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
Python library and CLI tool to interface with Google Translate's text-to-speech API
更适合新手的 AI 口语对话练习应用 / Beginner-friendly AI conversation practice application
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time
Cross-platform speech toolset, used from the command-line or as a Node.js library. Includes a variety of engines for speech synthesis, speech recognition, forced alignment, speech translation, voice i...
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
ModelScope: bring the notion of Model-as-a-Service to life.
Voice Recognition to Text Tool / 一个离线运行的本地音视频转字幕工具,输出json、srt字幕、纯文字格式
A simple, high-quality voice conversion tool focused on ease of use and performance.
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
Speech To Speech: an effort for an open-sourced and modular GPT4-o
✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM
EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
Text-to-speech API endpoint compatible with OpenAI's TTS API endpoint, using Microsoft Edge TTS to generate speech for free locally
更适合新手的 AI 口语对话练习应用 / Beginner-friendly AI conversation practice application
Cross-platform speech toolset, used from the command-line or as a Node.js library. Includes a variety of engines for speech synthesis, speech recognition, forced alignment, speech translation, voice i...
Free, high quality text-to-speech for your Obsidian notes, leveraging Microsoft Edge's Read Aloud API.
SCOREQ: Speech COntrastive REgression for Quality Assessment (NeurIPS 2024)
✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM
Text-to-speech API endpoint compatible with OpenAI's TTS API endpoint, using Microsoft Edge TTS to generate speech for free locally
Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection
A toolkit to calculate speech audio quality. Not affiliated with the original authors
Persian/Farsi text to speech(TTS) training using coqui tts
A simple, high-quality voice conversion tool focused on ease of use and performance.
Voice Recognition to Text Tool / 一个离线运行的本地音视频转字幕工具,输出json、srt字幕、纯文字格式
Timething is a library for aligning text transcripts with their audio recordings.
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
更适合新手的 AI 口语对话练习应用 / Beginner-friendly AI conversation practice application
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection
🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
Voice Recognition to Text Tool / 一个离线运行的本地音视频转字幕工具,输出json、srt字幕、纯文字格式
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM
ModelScope: bring the notion of Model-as-a-Service to life.
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
Speech To Speech: an effort for an open-sourced and modular GPT4-o
A simple, high-quality voice conversion tool focused on ease of use and performance.
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
Free, high quality text-to-speech for your Obsidian notes, leveraging Microsoft Edge's Read Aloud API.
✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM
Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection
SCOREQ: Speech COntrastive REgression for Quality Assessment (NeurIPS 2024)
Text-to-speech API endpoint compatible with OpenAI's TTS API endpoint, using Microsoft Edge TTS to generate speech for free locally
Cross-platform speech toolset, used from the command-line or as a Node.js library. Includes a variety of engines for speech synthesis, speech recognition, forced alignment, speech translation, voice i...
An Android keyboard that performs speech-to-text (STT/ASR) with OpenAI Whisper and input the recognized text; Supports English, Chinese, Japanese, etc. and even mixed languages.
A toolkit to calculate speech audio quality. Not affiliated with the original authors
Ultra-low-bitrate Speech Codec for Speech Language Modeling Applications
This repository contains the training, inference, evaluation code for SpeechLLM models and details about the model releases on huggingface.
Desktop application for Linux and Windows that utilizes distil-whisper models from HuggingFace, to enable real-time offline speech-to-text dictation.
Voice Recognition to Text Tool / 一个离线运行的本地音视频转字幕工具,输出json、srt字幕、纯文字格式
Dockerfile for WhisperX: Automatic Speech Recognition with Word-Level Timestamps and Speaker Diarization (Dockerfile, CI image build and test)
Speech To Speech: an effort for an open-sourced and modular GPT4-o
Voice Recognition to Text Tool / 一个离线运行的本地音视频转字幕工具,输出json、srt字幕、纯文字格式
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection
✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM
Optimized local inference for LLMs with HuggingFace-like APIs for quantization, vision/language models, multimodal agents, speech, vector DB, and RAG.
Text-to-speech API endpoint compatible with OpenAI's TTS API endpoint, using Microsoft Edge TTS to generate speech for free locally
AAAI 2025: Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
[InterSpeech 24] FreeV: Free Lunch For Vocoders Through Pseudo Inversed Mel Filter
This repository contains the training, inference, evaluation code for SpeechLLM models and details about the model releases on huggingface.
Using the TextSoundSaver application, you can convert text into realistic synthesized speech. The app achieves smooth and natural text-to-speech conversion. In addition to providing excellent voice sy...
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Foundational model for human-like, expressive TTS
Speech To Speech: an effort for an open-sourced and modular GPT4-o
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time
Voice Recognition to Text Tool / 一个离线运行的本地音视频转字幕工具,输出json、srt字幕、纯文字格式
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
ModelScope: bring the notion of Model-as-a-Service to life.
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
A simple, high-quality voice conversion tool focused on ease of use and performance.
:robot: :speech_balloon: Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)
🚀 Curated collection of Amazing Python scripts from Basics to Advance with automation task scripts.
Controllable and fast Text-to-Speech for over 7000 languages!
Voice Recognition to Text Tool / 一个离线运行的本地音视频转字幕工具,输出json、srt字幕、纯文字格式
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
Foundational model for human-like, expressive TTS
✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM
AudioBench: A Universal Benchmark for Audio Large Language Models
An Android keyboard that performs speech-to-text (STT/ASR) with OpenAI Whisper and input the recognized text; Supports English, Chinese, Japanese, etc. and even mixed languages.
[InterSpeech 24] FreeV: Free Lunch For Vocoders Through Pseudo Inversed Mel Filter
A simple, high-quality voice conversion tool focused on ease of use and performance.
Real-time Voice Activity Detection (VAD) with some example use case like simple voice bot and live transcription (realtime transcription)
Dockerfile for WhisperX: Automatic Speech Recognition with Word-Level Timestamps and Speaker Diarization (Dockerfile, CI image build and test)
Cross-platform speech toolset, used from the command-line or as a Node.js library. Includes a variety of engines for speech synthesis, speech recognition, forced alignment, speech translation, voice i...