Trending repositories for topic speech
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time
Voice Recognition to Text Tool / 一个离线运行的本地音视频转字幕工具,输出json、srt字幕、纯文字格式
:robot: :speech_balloon: Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
Speech To Speech: an effort for an open-sourced and modular GPT4-o
Text-to-speech API endpoint compatible with OpenAI's TTS API endpoint, using Microsoft Edge TTS to generate speech for free locally
EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
A simple, high-quality voice conversion tool focused on ease of use and performance
🚀 Curated collection of Amazing Python scripts from Basics to Advance with automation task scripts.
Text-to-speech API endpoint compatible with OpenAI's TTS API endpoint, using Microsoft Edge TTS to generate speech for free locally
This repository contains the training, inference, evaluation code for SpeechLLM models and details about the model releases on huggingface.
Timething is a library for aligning text transcripts with their audio recordings.
Voice Recognition to Text Tool / 一个离线运行的本地音视频转字幕工具,输出json、srt字幕、纯文字格式
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
A collection of datasets for the purpose of emotion recognition/detection in speech.
Speech To Speech: an effort for an open-sourced and modular GPT4-o
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
A simple, high-quality voice conversion tool focused on ease of use and performance
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture Design
Fully customizable AI chatbot component for your website
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
Voice Recognition to Text Tool / 一个离线运行的本地音视频转字幕工具,输出json、srt字幕、纯文字格式
🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
Speech To Speech: an effort for an open-sourced and modular GPT4-o
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
ModelScope: bring the notion of Model-as-a-Service to life.
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
:robot: :speech_balloon: Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
Fully customizable AI chatbot component for your website
A simple, high-quality voice conversion tool focused on ease of use and performance
Text-to-speech API endpoint compatible with OpenAI's TTS API endpoint, using Microsoft Edge TTS to generate speech for free locally
Dataset of ICASSP 2021 MULTILINGUAL PHONETIC DATASET FOR LOW RESOURCE SPEECH RECOGNITION
Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection
The official repo of NBC & SpatialNet for multichannel speech separation, denoising, and dereverberation
This repository contains the training, inference, evaluation code for SpeechLLM models and details about the model releases on huggingface.
Voice Recognition to Text Tool / 一个离线运行的本地音视频转字幕工具,输出json、srt字幕、纯文字格式
UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
The Official PyTorch Implementation of FN-SSL & IPDnet for Sound Source Localization
Easy-to-use speech toolset. Written in TypeScript. Includes tools for synthesis, recognition, alignment, speech translation, language detection, source separation and more.
A collection of datasets for the purpose of emotion recognition/detection in speech.
Timething is a library for aligning text transcripts with their audio recordings.
Fully customizable AI chatbot component for your website
A list of publicly available room impulse response datasets and scripts to download them.
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Speech To Speech: an effort for an open-sourced and modular GPT4-o
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
Voice Recognition to Text Tool / 一个离线运行的本地音视频转字幕工具,输出json、srt字幕、纯文字格式
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
:robot: :speech_balloon: Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)
ModelScope: bring the notion of Model-as-a-Service to life.
A simple, high-quality voice conversion tool focused on ease of use and performance
EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
Text-to-speech API endpoint compatible with OpenAI's TTS API endpoint, using Microsoft Edge TTS to generate speech for free locally
This repository contains the training, inference, evaluation code for SpeechLLM models and details about the model releases on huggingface.
Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection
Real-time Voice Activity Detection (VAD) with some example use case like simple voice bot and live transcription (realtime transcription)
An Android keyboard that performs speech-to-text (STT/ASR) with OpenAI Whisper and input the recognized text; Supports English, Chinese, Japanese, etc. and even mixed languages.
The official repo of NBC & SpatialNet for multichannel speech separation, denoising, and dereverberation
SCOREQ: Speech COntrastive REgression for Quality Assessment (NeurIPS 2024)
A toolkit to calculate speech audio quality. Not affiliated with the original authors
Speech To Speech: an effort for an open-sourced and modular GPT4-o
Code for the INTERSPEECH 2023 paper "Learning When to Speak: Latency and Quality Trade-offs for Simultaneous Speech-to-Speech Translation with Offline Models"
Official repository of the IEEE SLT 2024 paper "Self-Supervised Syllable Discovery Based on Speaker-Disentangled HuBERT"
Voice Recognition to Text Tool / 一个离线运行的本地音视频转字幕工具,输出json、srt字幕、纯文字格式
Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
Easy-to-use speech toolset. Written in TypeScript. Includes tools for synthesis, recognition, alignment, speech translation, language detection, source separation and more.
The Official PyTorch Implementation of FN-SSL & IPDnet for Sound Source Localization
Speech To Speech: an effort for an open-sourced and modular GPT4-o
Voice Recognition to Text Tool / 一个离线运行的本地音视频转字幕工具,输出json、srt字幕、纯文字格式
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection
Optimized local inference for LLMs with HuggingFace-like APIs for quantization, vision/language models, multimodal agents, speech, vector DB, and RAG.
Text-to-speech API endpoint compatible with OpenAI's TTS API endpoint, using Microsoft Edge TTS to generate speech for free locally
Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
[InterSpeech 24] FreeV: Free Lunch For Vocoders Through Pseudo Inversed Mel Filter
Using the TextSoundSaver application, you can convert text into realistic synthesized speech. The app achieves smooth and natural text-to-speech conversion. In addition to providing excellent voice sy...
This repository contains the training, inference, evaluation code for SpeechLLM models and details about the model releases on huggingface.
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Foundational model for human-like, expressive TTS
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
Speech To Speech: an effort for an open-sourced and modular GPT4-o
🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time
EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
Voice Recognition to Text Tool / 一个离线运行的本地音视频转字幕工具,输出json、srt字幕、纯文字格式
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
ModelScope: bring the notion of Model-as-a-Service to life.
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
A simple, high-quality voice conversion tool focused on ease of use and performance
:robot: :speech_balloon: Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)
🚀 Curated collection of Amazing Python scripts from Basics to Advance with automation task scripts.
Controllable and fast Text-to-Speech for over 7000 languages!
Voice Recognition to Text Tool / 一个离线运行的本地音视频转字幕工具,输出json、srt字幕、纯文字格式
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
Foundational model for human-like, expressive TTS
A simple, high-quality voice conversion tool focused on ease of use and performance
[InterSpeech 24] FreeV: Free Lunch For Vocoders Through Pseudo Inversed Mel Filter
Real-time Voice Activity Detection (VAD) with some example use case like simple voice bot and live transcription (realtime transcription)
Dockerfile for WhisperX: Automatic Speech Recognition with Word-Level Timestamps and Speaker Diarization (Dockerfile, CI image build and test)
Unofficial SoundStream implementation of Pytorch with training code and 16kHz pretrained checkpoint