Trending repositories for topic speech

Last 3 days (new repositories)

no newly created repositories trending in the last 3 days

Last 3 days (absolute gain)

coqui-ai/TTS

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

36,162 (+73)

mpl-2.0

Orenoid/BabelDuck

更适合新手的 AI 口语对话练习应用 / Beginner-friendly AI conversation practice application

366 (+72)

m-bain/whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

12,904 (+39)

bsd-2-clause

fixie-ai/ultravox

A fast multimodal LLM for real-time voice

1,615 (+24)

mit

jianchang512/stt

Voice Recognition to Text Tool / 一个离线运行的本地音视频转字幕工具，输出json、srt字幕、纯文字格式

2,676 (+15)

gpl-3.0

snakers4/silero-vad

Silero VAD: pre-trained enterprise-grade Voice Activity Detector

4,561 (+14)

mit

modelscope/modelscope

ModelScope: bring the notion of Model-as-a-Service to life.

7,127 (+12)

apache-2.0

mozilla/TTS

:robot: :speech_balloon: Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)

9,472 (+12)

mpl-2.0

IDEA-Research/Grounded-Segment-Anything

Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything

15,399 (+12)

apache-2.0

huggingface/datasets

🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools

19,363 (+12)

apache-2.0

Rikorose/DeepFilterNet

Noise supression using deep filtering

2,608 (+11)

babysor/MockingBird

🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time

35,508 (+11)

svc-develop-team/so-vits-svc

SoftVC VITS Singing Voice Conversion

26,125 (+10)

agpl-3.0

MahmoudAshraf97/whisper-diarization

Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper

3,876 (+10)

bsd-2-clause

IAHispano/Applio

A simple, high-quality voice conversion tool focused on ease of use and performance.

1,879 (+9)

mit

nyrahealth/CrisperWhisper

Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection

470 (+7)

ahmetoner/whisper-asr-webservice

OpenAI Whisper ASR Webservice API

2,178 (+7)

mit

snakers4/silero-models

Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple

5,034 (+7)

echogarden-project/echogarden

Cross-platform speech toolset, used from the command-line or as a Node.js library. Includes a variety of engines for speech synthesis, speech recognition, forced alignment, speech translation, voice i...

248 (+6)

gpl-3.0

pndurette/gTTS

Python library and CLI tool to interface with Google Translate's text-to-speech API

2,339 (+6)

mit

Last 3 days (relative gain)

Orenoid/BabelDuck

更适合新手的 AI 口语对话练习应用 / Beginner-friendly AI conversation practice application

366 (+24%)

travisvn/obsidian-edge-tts

Free, high quality text-to-speech for your Obsidian notes, leveraging Microsoft Edge's Read Aloud API.

35 (+6%)

gpl-3.0

echogarden-project/echogarden

248 (+2%)

gpl-3.0

jishengpeng/WavChat

A Survey of Spoken Dialogue Models (60 pages)

198 (+2%)

alessandroragano/scoreq

SCOREQ: Speech COntrastive REgression for Quality Assessment (NeurIPS 2024)

58 (+2%)

travisvn/openai-edge-tts

Text-to-speech API endpoint compatible with OpenAI's TTS API endpoint, using Microsoft Edge TTS to generate speech for free locally

184 (+2%)

gpl-3.0

nyrahealth/CrisperWhisper

Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection

470 (+2%)

fixie-ai/ultravox

A fast multimodal LLM for real-time voice

1,615 (+2%)

mit

AudioLLMs/AudioBench

AudioBench: A Universal Benchmark for Audio Large Language Models

103 (+1.0%)

SWHL/AI-Competition-Collections

AI比赛经验帖子 & 训练和测试技巧帖子集锦（收集整理各种人工智能比赛经验帖）

335 (+0.6%)

apache-2.0

jianchang512/stt

Voice Recognition to Text Tool / 一个离线运行的本地音视频转字幕工具，输出json、srt字幕、纯文字格式

2,676 (+0.6%)

gpl-3.0

pszemraj/vid2cleantxt

Python API & command-line tool to easily transcribe speech-based video files into clean text

192 (+0.5%)

apache-2.0

IAHispano/Applio

A simple, high-quality voice conversion tool focused on ease of use and performance.

1,879 (+0.5%)

mit

Rikorose/DeepFilterNet

Noise supression using deep filtering

2,608 (+0.4%)

DigitalPhonetics/IMS-Toucan

Controllable and fast Text-to-Speech for over 7000 languages!

1,497 (+0.3%)

apache-2.0

ahmetoner/whisper-asr-webservice

OpenAI Whisper ASR Webservice API

2,178 (+0.3%)

mit

CUNY-CL/wikipron

Massively multilingual pronunciation mining

324 (+0.3%)

apache-2.0

snakers4/silero-vad

Silero VAD: pre-trained enterprise-grade Voice Activity Detector

4,561 (+0.3%)

mit

tincans-ai/gazelle

Joint speech-language model - respond directly to audio!

359 (+0.3%)

apache-2.0

pndurette/gTTS

Python library and CLI tool to interface with Google Translate's text-to-speech API

2,339 (+0.3%)

mit

Last week (new repositories)

no newly created repositories trending in the last week

Last week (absolute gain)

Orenoid/BabelDuck

更适合新手的 AI 口语对话练习应用 / Beginner-friendly AI conversation practice application

366 (+320)

coqui-ai/TTS

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

36,162 (+166)

mpl-2.0

m-bain/whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

12,904 (+99)

bsd-2-clause

fixie-ai/ultravox

A fast multimodal LLM for real-time voice

1,615 (+71)

mit

IDEA-Research/Grounded-Segment-Anything

Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything

15,399 (+45)

apache-2.0

svc-develop-team/so-vits-svc

SoftVC VITS Singing Voice Conversion

26,125 (+42)

agpl-3.0

snakers4/silero-vad

Silero VAD: pre-trained enterprise-grade Voice Activity Detector

4,561 (+40)

mit

babysor/MockingBird

🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time

35,508 (+40)

echogarden-project/echogarden

248 (+38)

gpl-3.0

MahmoudAshraf97/whisper-diarization

Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper

3,876 (+36)

bsd-2-clause

modelscope/modelscope

ModelScope: bring the notion of Model-as-a-Service to life.

7,127 (+33)

apache-2.0

jianchang512/stt

Voice Recognition to Text Tool / 一个离线运行的本地音视频转字幕工具，输出json、srt字幕、纯文字格式

2,676 (+32)

gpl-3.0

IAHispano/Applio

A simple, high-quality voice conversion tool focused on ease of use and performance.

1,879 (+23)

mit

huggingface/datasets

🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools

19,363 (+23)

apache-2.0

Rikorose/DeepFilterNet

Noise supression using deep filtering

2,608 (+15)

huggingface/speech-to-speech

Speech To Speech: an effort for an open-sourced and modular GPT4-o

3,617 (+15)

apache-2.0

VITA-MLLM/Freeze-Omni

✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM

210 (+14)

netease-youdao/EmotiVoice

EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine

7,523 (+14)

apache-2.0

ahmetoner/whisper-asr-webservice

OpenAI Whisper ASR Webservice API

2,178 (+13)

mit

travisvn/openai-edge-tts

Text-to-speech API endpoint compatible with OpenAI's TTS API endpoint, using Microsoft Edge TTS to generate speech for free locally

184 (+12)

gpl-3.0

Last week (relative gain)

Orenoid/BabelDuck

更适合新手的 AI 口语对话练习应用 / Beginner-friendly AI conversation practice application

366 (+696%)

echogarden-project/echogarden

248 (+18%)

gpl-3.0

travisvn/obsidian-edge-tts

Free, high quality text-to-speech for your Obsidian notes, leveraging Microsoft Edge's Read Aloud API.

35 (+9%)

gpl-3.0

alessandroragano/scoreq

SCOREQ: Speech COntrastive REgression for Quality Assessment (NeurIPS 2024)

58 (+7%)

VITA-MLLM/Freeze-Omni

✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM

210 (+7%)

travisvn/openai-edge-tts

Text-to-speech API endpoint compatible with OpenAI's TTS API endpoint, using Microsoft Edge TTS to generate speech for free locally

184 (+7%)

gpl-3.0

jishengpeng/WavChat

A Survey of Spoken Dialogue Models (60 pages)

198 (+6%)

fixie-ai/ultravox

A fast multimodal LLM for real-time voice

1,615 (+5%)

mit

AudioLLMs/AudioBench

AudioBench: A Universal Benchmark for Audio Large Language Models

103 (+3%)

hhguo/SoCodec

Ultra-low-bitrate Speech Codec for Speech Language Modeling Applications

73 (+3%)

mit

nyrahealth/CrisperWhisper

Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection

470 (+2%)

fakerybakery/utmos

A toolkit to calculate speech audio quality. Not affiliated with the original authors

44 (+2%)

mit

tarepan/SpeechMOS

Easy-to-Use Speech MOS predictors

240 (+2%)

mit

karim23657/Persian-tts-coqui

Persian/Farsi text to speech(TTS) training using coqui tts

120 (+2%)

mit

FireRedTeam/FireRedTTS

An Open-Sourced LLM-empowered Foundation TTS System

493 (+1%)

mpl-2.0

IAHispano/Applio

A simple, high-quality voice conversion tool focused on ease of use and performance.

1,879 (+1%)

mit

jianchang512/stt

Voice Recognition to Text Tool / 一个离线运行的本地音视频转字幕工具，输出json、srt字幕、纯文字格式

2,676 (+1%)

gpl-3.0

SWHL/AI-Competition-Collections

AI比赛经验帖子 & 训练和测试技巧帖子集锦（收集整理各种人工智能比赛经验帖）

335 (+1%)

apache-2.0

balisujohn/tortoise.cpp

A ggml (C++) re-implementation of tortoise-tts

168 (+1%)

mit

feldberlin/timething

Timething is a library for aligning text transcripts with their audio recordings.

105 (+1.0%)

mit

Last month (new repositories)

no newly created repositories trending in the last month

Last month (absolute gain)

coqui-ai/TTS

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

36,162 (+696)

mpl-2.0

m-bain/whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

12,904 (+416)

bsd-2-clause

fixie-ai/ultravox

A fast multimodal LLM for real-time voice

1,615 (+373)

mit

Orenoid/BabelDuck

更适合新手的 AI 口语对话练习应用 / Beginner-friendly AI conversation practice application

366 (+365)

IDEA-Research/Grounded-Segment-Anything

Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything

15,399 (+233)

apache-2.0

svc-develop-team/so-vits-svc

SoftVC VITS Singing Voice Conversion

26,125 (+232)

agpl-3.0

nyrahealth/CrisperWhisper

Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection

470 (+208)

babysor/MockingBird

🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time

35,508 (+195)

snakers4/silero-vad

Silero VAD: pre-trained enterprise-grade Voice Activity Detector

4,561 (+186)

mit

jianchang512/stt

Voice Recognition to Text Tool / 一个离线运行的本地音视频转字幕工具，输出json、srt字幕、纯文字格式

2,676 (+178)

gpl-3.0

MahmoudAshraf97/whisper-diarization

Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper

3,876 (+166)

bsd-2-clause

jishengpeng/WavChat

A Survey of Spoken Dialogue Models (60 pages)

198 (+115)

VITA-MLLM/Freeze-Omni

✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM

210 (+115)

modelscope/modelscope

ModelScope: bring the notion of Model-as-a-Service to life.

7,127 (+113)

apache-2.0

huggingface/datasets

🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools

19,363 (+98)

apache-2.0

netease-youdao/EmotiVoice

EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine

7,523 (+87)

apache-2.0

huggingface/speech-to-speech

Speech To Speech: an effort for an open-sourced and modular GPT4-o

3,617 (+84)

apache-2.0

IAHispano/Applio

A simple, high-quality voice conversion tool focused on ease of use and performance.

1,879 (+81)

mit

kaldi-asr/kaldi

kaldi-asr/kaldi is the official location of the Kaldi project.

14,366 (+79)

linto-ai/whisper-timestamped

Multilingual Automatic Speech Recognition with word-level timestamps and confidence

2,121 (+76)

agpl-3.0

Last month (relative gain)

travisvn/obsidian-edge-tts

Free, high quality text-to-speech for your Obsidian notes, leveraging Microsoft Edge's Read Aloud API.

35 (+192%)

gpl-3.0

jishengpeng/WavChat

A Survey of Spoken Dialogue Models (60 pages)

198 (+139%)

VITA-MLLM/Freeze-Omni

✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM

210 (+121%)

nyrahealth/CrisperWhisper

Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection

470 (+79%)

alessandroragano/scoreq

SCOREQ: Speech COntrastive REgression for Quality Assessment (NeurIPS 2024)

58 (+53%)

travisvn/openai-edge-tts

Text-to-speech API endpoint compatible with OpenAI's TTS API endpoint, using Microsoft Edge TTS to generate speech for free locally

184 (+45%)

gpl-3.0

fixie-ai/ultravox

A fast multimodal LLM for real-time voice

1,615 (+30%)

mit

echogarden-project/echogarden

248 (+27%)

gpl-3.0

j3soon/whisper-to-input

An Android keyboard that performs speech-to-text (STT/ASR) with OpenAI Whisper and input the recognized text; Supports English, Chinese, Japanese, etc. and even mixed languages.

50 (+14%)

FireRedTeam/FireRedTTS

An Open-Sourced LLM-empowered Foundation TTS System

493 (+13%)

mpl-2.0

fakerybakery/utmos

A toolkit to calculate speech audio quality. Not affiliated with the original authors

44 (+13%)

mit

hhguo/SoCodec

Ultra-low-bitrate Speech Codec for Speech Language Modeling Applications

73 (+12%)

mit

AudioLLMs/AudioBench

AudioBench: A Universal Benchmark for Audio Large Language Models

103 (+12%)

YasserdahouML/visper

ViSpeR: Multilingual Audio-Visual Speech Recognition

30 (+11%)

skit-ai/SpeechLLM

This repository contains the training, inference, evaluation code for SpeechLLM models and details about the model releases on huggingface.

67 (+10%)

apache-2.0

Mohamad-Hussein/speech-assistant

Desktop application for Linux and Windows that utilizes distil-whisper models from HuggingFace, to enable real-time offline speech-to-text dictation.

53 (+8%)

mit

jianchang512/stt

Voice Recognition to Text Tool / 一个离线运行的本地音视频转字幕工具，输出json、srt字幕、纯文字格式

2,676 (+7%)

gpl-3.0

mark-rez/TikTok-Voice-TTS

Simple Python script to interact with the TikTok TTS Voices.

49 (+7%)

voidful/Codec-SUPERB

Audio Codec Speech processing Universal PERformance Benchmark

234 (+6%)

jim60105/docker-whisperX

Dockerfile for WhisperX: Automatic Speech Recognition with Word-Level Timestamps and Speaker Diarization (Dockerfile, CI image build and test)

190 (+6%)

mit

Last 12-months (new repositories)

metavoiceio/metavoice-src

Foundational model for human-like, expressive TTS

3,936

apache-2.0

huggingface/speech-to-speech

Speech To Speech: an effort for an open-sourced and modular GPT4-o

3,617

apache-2.0

jianchang512/stt

Voice Recognition to Text Tool / 一个离线运行的本地音视频转字幕工具，输出json、srt字幕、纯文字格式

2,676

gpl-3.0

Camb-ai/MARS5-TTS

MARS5 speech model (TTS) from CAMB.AI

2,556

agpl-3.0

fixie-ai/ultravox

A fast multimodal LLM for real-time voice

1,615

mit

ictnlp/StreamSpeech

StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.

977

mit

FireRedTeam/FireRedTTS

An Open-Sourced LLM-empowered Foundation TTS System

493

mpl-2.0

nyrahealth/CrisperWhisper

Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection

470

Orenoid/BabelDuck

更适合新手的 AI 口语对话练习应用 / Beginner-friendly AI conversation practice application

366

tincans-ai/gazelle

Joint speech-language model - respond directly to audio!

359

apache-2.0

VITA-MLLM/Freeze-Omni

✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM

210

dusty-nv/NanoLLM

Optimized local inference for LLMs with HuggingFace-like APIs for quantization, vision/language models, multimodal agents, speech, vector DB, and RAG.

207

mit

jishengpeng/WavChat

A Survey of Spoken Dialogue Models (60 pages)

198

travisvn/openai-edge-tts

Text-to-speech API endpoint compatible with OpenAI's TTS API endpoint, using Microsoft Edge TTS to generate speech for free locally

184

gpl-3.0

zhenye234/xcodec

AAAI 2025: Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model

111

AudioLLMs/AudioBench

AudioBench: A Universal Benchmark for Audio Large Language Models

103

BakerBunker/FreeV

[InterSpeech 24] FreeV: Free Lunch For Vocoders Through Pseudo Inversed Mel Filter

mit

hhguo/SoCodec

Ultra-low-bitrate Speech Codec for Speech Language Modeling Applications

mit

skit-ai/SpeechLLM

This repository contains the training, inference, evaluation code for SpeechLLM models and details about the model releases on huggingface.

apache-2.0

jaywcjlove/TextSoundSaver

Using the TextSoundSaver application, you can convert text into realistic synthesized speech. The app achieves smooth and natural text-to-speech conversion. In addition to providing excellent voice sy...

Last 12-months (absolute gain)

coqui-ai/TTS

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

36,162 (+11,961)

mpl-2.0

m-bain/whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

12,904 (+5,867)

bsd-2-clause

svc-develop-team/so-vits-svc

SoftVC VITS Singing Voice Conversion

26,125 (+4,510)

agpl-3.0

metavoiceio/metavoice-src

Foundational model for human-like, expressive TTS

3,936 (+3,846)

apache-2.0

huggingface/speech-to-speech

Speech To Speech: an effort for an open-sourced and modular GPT4-o

3,617 (+3,616)

apache-2.0

IDEA-Research/Grounded-Segment-Anything

Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything

15,399 (+3,473)

apache-2.0

babysor/MockingBird

🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time

35,508 (+3,039)

jianchang512/stt

Voice Recognition to Text Tool / 一个离线运行的本地音视频转字幕工具，输出json、srt字幕、纯文字格式

2,676 (+2,667)

gpl-3.0

Camb-ai/MARS5-TTS

MARS5 speech model (TTS) from CAMB.AI

2,556 (+2,535)

agpl-3.0

MahmoudAshraf97/whisper-diarization

Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper

3,876 (+2,504)

bsd-2-clause

snakers4/silero-vad

Silero VAD: pre-trained enterprise-grade Voice Activity Detector

4,561 (+2,362)

mit

netease-youdao/EmotiVoice

EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine

7,523 (+2,195)

apache-2.0

modelscope/modelscope

ModelScope: bring the notion of Model-as-a-Service to life.

7,127 (+2,002)

apache-2.0

fixie-ai/ultravox

A fast multimodal LLM for real-time voice

1,615 (+1,614)

mit

huggingface/datasets

🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools

19,363 (+1,608)

apache-2.0

IAHispano/Applio

A simple, high-quality voice conversion tool focused on ease of use and performance.

1,879 (+1,576)

mit

mozilla/TTS

:robot: :speech_balloon: Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)

9,472 (+1,147)

mpl-2.0

avinashkranjan/Amazing-Python-Scripts

🚀 Curated collection of Amazing Python scripts from Basics to Advance with automation task scripts.

2,774 (+1,130)

mit

DigitalPhonetics/IMS-Toucan

Controllable and fast Text-to-Speech for over 7000 languages!

1,497 (+1,091)

apache-2.0

Rikorose/DeepFilterNet

Noise supression using deep filtering

2,608 (+1,027)

Last 12-months (relative gain)

jianchang512/stt

Voice Recognition to Text Tool / 一个离线运行的本地音视频转字幕工具，输出json、srt字幕、纯文字格式

2,676 (+29,633%)

gpl-3.0

ictnlp/StreamSpeech

StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.

977 (+16,183%)

mit

Camb-ai/MARS5-TTS

MARS5 speech model (TTS) from CAMB.AI

2,556 (+12,071%)

agpl-3.0

jishengpeng/WavChat

A Survey of Spoken Dialogue Models (60 pages)

198 (+4,850%)

metavoiceio/metavoice-src

Foundational model for human-like, expressive TTS

3,936 (+4,273%)

apache-2.0

VITA-MLLM/Freeze-Omni

✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM

210 (+3,400%)

FireRedTeam/FireRedTTS

An Open-Sourced LLM-empowered Foundation TTS System

493 (+3,187%)

mpl-2.0

balisujohn/tortoise.cpp

A ggml (C++) re-implementation of tortoise-tts

168 (+2,000%)

mit

AudioLLMs/AudioBench

AudioBench: A Universal Benchmark for Audio Large Language Models

103 (+1,371%)

j3soon/whisper-to-input

An Android keyboard that performs speech-to-text (STT/ASR) with OpenAI Whisper and input the recognized text; Supports English, Chinese, Japanese, etc. and even mixed languages.

50 (+1,150%)

interactiveaudiolab/ppgs

High-Fidelity Neural Phonetic Posteriorgrams

99 (+725%)

mit

leduckhai/wav2graph

Information Retrieval from Audio via Knowledge Graph

87 (+691%)

mit

maxrmorrison/promonet

Prosody and Pronunciation Modification Network

46 (+667%)

mit

BakerBunker/FreeV

[InterSpeech 24] FreeV: Free Lunch For Vocoders Through Pseudo Inversed Mel Filter

82 (+531%)

mit

IAHispano/Applio

A simple, high-quality voice conversion tool focused on ease of use and performance.

1,879 (+520%)

mit

mark-rez/TikTok-Voice-TTS

Simple Python script to interact with the TikTok TTS Voices.

49 (+513%)

hanifabd/voice-activity-detection-vad-realtime

Real-time Voice Activity Detection (VAD) with some example use case like simple voice bot and live transcription (realtime transcription)

58 (+427%)

YasserdahouML/visper

ViSpeR: Multilingual Audio-Visual Speech Recognition

30 (+400%)

jim60105/docker-whisperX

Dockerfile for WhisperX: Automatic Speech Recognition with Word-Level Timestamps and Speaker Diarization (Dockerfile, CI image build and test)

190 (+332%)

mit

echogarden-project/echogarden

248 (+328%)

gpl-3.0