Trending repositories for topic speech
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time
Voice Recognition to Text Tool / 一个离线运行的本地语音识别转文字服务,输出json、srt字幕带时间戳、纯文字格式
Using the TextSoundSaver application, you can convert text into realistic synthesized speech. The app achieves smooth and natural text-to-speech conversion. In addition to providing excellent voice sy...
Grounded-SAM: Marrying Grounding-DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
ModelScope: bring the notion of Model-as-a-Service to life.
:robot: :speech_balloon: Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
VITS-based Voice Conversion focused on simplicity, quality and performance.
EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
Using the TextSoundSaver application, you can convert text into realistic synthesized speech. The app achieves smooth and natural text-to-speech conversion. In addition to providing excellent voice sy...
Optimized local inference for LLMs with HuggingFace-like APIs for quantization, vision/language models, multimodal agents, speech, vector DB, and RAG.
Voice Recognition to Text Tool / 一个离线运行的本地语音识别转文字服务,输出json、srt字幕带时间戳、纯文字格式
Central repository for all lectures on deep learning at UPC ETSETB TelecomBCN.
CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model
Foundational model for human-like, expressive TTS
This is a list of sound, audio and music development tools which contains machine learning, audio generation, audio signal processing, sound synthesis, spatial audio, music information retrieval, musi...
VITS-based Voice Conversion focused on simplicity, quality and performance.
A simple Azure Speech Service module that uses the Microsoft Edge Read Aloud API
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
SpeechIO Leaderboard: a large, robust, comprehensive, benchmarking platform for Automatic Speech Recognition.
Fully customizable AI chatbot component for your website
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture Design
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time
Grounded-SAM: Marrying Grounding-DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
Voice Recognition to Text Tool / 一个离线运行的本地语音识别转文字服务,输出json、srt字幕带时间戳、纯文字格式
ModelScope: bring the notion of Model-as-a-Service to life.
EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
Using the TextSoundSaver application, you can convert text into realistic synthesized speech. The app achieves smooth and natural text-to-speech conversion. In addition to providing excellent voice sy...
:robot: :speech_balloon: Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)
VITS-based Voice Conversion focused on simplicity, quality and performance.
Fully customizable AI chatbot component for your website
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
Using the TextSoundSaver application, you can convert text into realistic synthesized speech. The app achieves smooth and natural text-to-speech conversion. In addition to providing excellent voice sy...
Optimized local inference for LLMs with HuggingFace-like APIs for quantization, vision/language models, multimodal agents, speech, vector DB, and RAG.
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
Voice Recognition to Text Tool / 一个离线运行的本地语音识别转文字服务,输出json、srt字幕带时间戳、纯文字格式
Unofficial SoundStream implementation of Pytorch with training code and 16kHz pretrained checkpoint
Whisper Dart is a cross platform library for dart and flutter that allows converting audio to text / speech to text / inference from Open AI models
Central repository for all lectures on deep learning at UPC ETSETB TelecomBCN.
VITS-based Voice Conversion focused on simplicity, quality and performance.
Fully customizable AI chatbot component for your website
A ggml (C++) re-implementation of tortoise-tts. Under construction and seeking contributors.
CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model
Timething is a library for aligning text transcripts with their audio recordings.
A simple Azure Speech Service module that uses the Microsoft Edge Read Aloud API
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Grounded-SAM: Marrying Grounding-DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
Voice Recognition to Text Tool / 一个离线运行的本地语音识别转文字服务,输出json、srt字幕带时间戳、纯文字格式
EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
ModelScope: bring the notion of Model-as-a-Service to life.
VITS-based Voice Conversion focused on simplicity, quality and performance.
:robot: :speech_balloon: Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)
Fully customizable AI chatbot component for your website
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple
Using the TextSoundSaver application, you can convert text into realistic synthesized speech. The app achieves smooth and natural text-to-speech conversion. In addition to providing excellent voice sy...
Easy-to-use speech toolset. Written in TypeScript. Includes tools for synthesis, recognition, alignment, speech translation, language detection, source separation and more.
Optimized local inference for LLMs with HuggingFace-like APIs for quantization, vision/language models, multimodal agents, speech, vector DB, and RAG.
Real-time Voice Activity Detection (VAD) with some example use case like simple voice bot and live transcription (realtime transcription)
Simple Python script to interact with the TikTok TTS Voices.
Voice Recognition to Text Tool / 一个离线运行的本地语音识别转文字服务,输出json、srt字幕带时间戳、纯文字格式
Dockerfile for WhisperX: Automatic Speech Recognition with Word-Level Timestamps and Speaker Diarization (Dockerfile, CI image build and test)
VITS-based Voice Conversion focused on simplicity, quality and performance.
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
The Official PyTorch Implementation of FN-SSL & IPDnet for Sound Source Localization
A TensorFlow-based spoken language identification
VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture Design
Fully customizable AI chatbot component for your website
CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model
EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
Voice Recognition to Text Tool / 一个离线运行的本地语音识别转文字服务,输出json、srt字幕带时间戳、纯文字格式
VITS-based Voice Conversion focused on simplicity, quality and performance.
VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture Design
Optimized local inference for LLMs with HuggingFace-like APIs for quantization, vision/language models, multimodal agents, speech, vector DB, and RAG.
Dockerfile for WhisperX: Automatic Speech Recognition with Word-Level Timestamps and Speaker Diarization (Dockerfile, CI image build and test)
A ggml (C++) re-implementation of tortoise-tts. Under construction and seeking contributors.
Using the TextSoundSaver application, you can convert text into realistic synthesized speech. The app achieves smooth and natural text-to-speech conversion. In addition to providing excellent voice sy...
Desktop application for Linux and Windows that utilizes distil-whisper models from HuggingFace, to enable real-time offline speech-to-text dictation.
Unofficial SoundStream implementation of Pytorch with training code and 16kHz pretrained checkpoint
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Grounded-SAM: Marrying Grounding-DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time
ModelScope: bring the notion of Model-as-a-Service to life.
Foundational model for human-like, expressive TTS
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
:robot: :speech_balloon: Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)
Voice Recognition to Text Tool / 一个离线运行的本地语音识别转文字服务,输出json、srt字幕带时间戳、纯文字格式
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
Fully customizable AI chatbot component for your website
VITS-based Voice Conversion focused on simplicity, quality and performance.
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple
EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
Voice Recognition to Text Tool / 一个离线运行的本地语音识别转文字服务,输出json、srt字幕带时间戳、纯文字格式
Fully customizable AI chatbot component for your website
Foundational model for human-like, expressive TTS
CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model
A ggml (C++) re-implementation of tortoise-tts. Under construction and seeking contributors.
Easy-to-use speech toolset. Written in TypeScript. Includes tools for synthesis, recognition, alignment, speech translation, language detection, source separation and more.
《SpeechPrompt v2: Prompt Tuning for Speech Classification Tasks》Speech processing with prompting paradigm
Persian/Farsi text to speech(TTS) training using coqui tts
World's First Multilingual Inexpensive Therapeutic Sophisticated Ultra-responsive Holographic Agent. In simple terms, an AI you can talk to and it'll talk back with a body using VTube Studio.
A Python implementation of the Speech Intelligibility Index
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
The Official PyTorch Implementation of FN-SSL & IPDnet for Sound Source Localization
Whisper Dart is a cross platform library for dart and flutter that allows converting audio to text / speech to text / inference from Open AI models
Source code for ACL 2023 paper "End-to-End Simultaneous Speech Translation with Differentiable Segmentation"