Trending repositories for topic speech-recognition

Last 3 days (new repositories)

no newly created repositories trending in the last 3 days

Last 3 days (absolute gain)

huggingface/transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

136,424 (+142)

apache-2.0

ggerganov/whisper.cpp

Port of OpenAI's Whisper model in C/C++

36,355 (+86)

mit

SYSTRAN/faster-whisper

Faster Whisper transcription with CTranslate2

13,007 (+52)

mit

Evil0ctal/Fast-Powerful-Whisper-AI-Services-API

⚡ 一款用于自动语音识别 (ASR)、翻译的高性能异步 API。不需要购买Whisper API，使用本地运行的Whisper模型进行推理，并支持多GPU并发，针对分布式部署进行设计。还内置了包括TikTok、抖音等社交媒体平台的爬虫，可实现来自多个社交平台的无缝媒体处理，为媒体内容数据自动化处理提供了强大且可扩展的解决方案。

205 (+39)

apache-2.0

m-bain/whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

12,904 (+39)

bsd-2-clause

FunAudioLLM/SenseVoice

Multilingual Voice Understanding Model

3,725 (+32)

modelscope/FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

7,346 (+32)

mozilla/DeepSpeech

DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.

25,483 (+19)

mpl-2.0

alphacep/vosk-api

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node

8,303 (+19)

apache-2.0

abus-aikorea/voice-pro

Comprehensive Gradio WebUI for audio processing, powered by Whisper engines (Whisper, Faster-Whisper, Whisper-Timestamped). Features Voice Changer, zero-shot Voice Cloning (E2, F5-TTS), YouTube downlo...

2,344 (+17)

mit

argmaxinc/WhisperKit

On-device Speech Recognition for Apple Silicon

4,034 (+16)

mit

jianchang512/stt

Voice Recognition to Text Tool / 一个离线运行的本地音视频转字幕工具，输出json、srt字幕、纯文字格式

2,676 (+15)

gpl-3.0

openvinotoolkit/openvino

OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference

7,461 (+15)

apache-2.0

modelscope/FunClip

Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.

3,877 (+14)

mit

NVIDIA/DeepLearningExamples

State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.

13,686 (+14)

PaddlePaddle/PaddleSpeech

Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation a...

11,267 (+12)

apache-2.0

Chenyme/Chenyme-AAVT

这是一个全自动（音频）视频翻译项目。利用Whisper识别声音，AI大模型翻译字幕，最后合并字幕视频，生成翻译后的视频。

1,922 (+11)

mit

pluja/whishper

Transcribe any audio to text, translate and edit subtitles 100% locally with a web UI. Powered by whisper models!

1,712 (+9)

agpl-3.0

yanshengjia/ml-road

Machine Learning Resources, Practice and Research

3,541 (+9)

mit

salute-developers/GigaAM

Foundational Model for Speech Recognition Tasks

135 (+7)

mit

Last 3 days (relative gain)

Evil0ctal/Fast-Powerful-Whisper-AI-Services-API

205 (+23%)

apache-2.0

salute-developers/GigaAM

Foundational Model for Speech Recognition Tasks

135 (+5%)

mit

echogarden-project/echogarden

Cross-platform speech toolset, used from the command-line or as a Node.js library. Includes a variety of engines for speech synthesis, speech recognition, forced alignment, speech translation, voice i...

248 (+2%)

gpl-3.0

savbell/whisper-writer

💬📝 A small dictation app using OpenAI's Whisper speech recognition model.

384 (+2%)

gpl-3.0

aniemore/Aniemore

Emotions recognition from audio and text files (only russian language)

57 (+2%)

mit

nyrahealth/CrisperWhisper

Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection

470 (+2%)

Thiagohgl/ai-pronunciation-trainer

This tool uses AI to evaluate your pronunciation.

169 (+1%)

agpl-3.0

HenestrosaDev/audiotext

A desktop application that transcribes audio from files, microphone input or YouTube videos with the option to translate the content and create subtitles.

174 (+1%)

JosefAlbers/whisper-turbo-mlx

Blazing fast whisper turbo for ASR (speech-to-text) tasks

177 (+1%)

mit

AudioLLMs/AudioBench

AudioBench: A Universal Benchmark for Audio Large Language Models

103 (+1.0%)

Pikurrot/whisper-gui

A simple GUI to use Whisper.

116 (+0.9%)

mit

FunAudioLLM/SenseVoice

Multilingual Voice Understanding Model

3,725 (+0.9%)

lobehub/lobe-tts

🎤 Lobe TTS - A high-quality & reliable TTS/STT library for Server and Browser

481 (+0.8%)

mit

YuanGongND/ltu

Code, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand".

396 (+0.8%)

abus-aikorea/voice-pro

2,344 (+0.7%)

mit

vndee/local-talking-llm

A talking LLM that runs on your own computer without needing the internet.

315 (+0.6%)

mit

Chenyme/Chenyme-AAVT

这是一个全自动（音频）视频翻译项目。利用Whisper识别声音，AI大模型翻译字幕，最后合并字幕视频，生成翻译后的视频。

1,922 (+0.6%)

mit

jianchang512/stt

Voice Recognition to Text Tool / 一个离线运行的本地音视频转字幕工具，输出json、srt字幕、纯文字格式

2,676 (+0.6%)

gpl-3.0

themanyone/whisper_dictation

Private voice keyboard, AI chat, images, webcam, recordings, voice control with >= 4 GiB of VRAM.

186 (+0.5%)

gpl-2.0

pszemraj/vid2cleantxt

Python API & command-line tool to easily transcribe speech-based video files into clean text

192 (+0.5%)

apache-2.0

Last week (new repositories)

no newly created repositories trending in the last week

Last week (absolute gain)

huggingface/transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

136,424 (+299)

apache-2.0

ggerganov/whisper.cpp

Port of OpenAI's Whisper model in C/C++

36,355 (+197)

mit

SYSTRAN/faster-whisper

Faster Whisper transcription with CTranslate2

13,007 (+121)

mit

m-bain/whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

12,904 (+99)

bsd-2-clause

Evil0ctal/Fast-Powerful-Whisper-AI-Services-API

205 (+90)

apache-2.0

modelscope/FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

7,346 (+89)

FunAudioLLM/SenseVoice

Multilingual Voice Understanding Model

3,725 (+75)

abus-aikorea/voice-pro

2,344 (+67)

mit

alphacep/vosk-api

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node

8,303 (+46)

apache-2.0

openvinotoolkit/openvino

OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference

7,461 (+44)

apache-2.0

echogarden-project/echogarden

248 (+38)

gpl-3.0

MahmoudAshraf97/whisper-diarization

Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper

3,876 (+36)

bsd-2-clause

speechbrain/speechbrain

A PyTorch-based Speech Toolkit

9,071 (+33)

apache-2.0

argmaxinc/WhisperKit

On-device Speech Recognition for Apple Silicon

4,034 (+32)

mit

jianchang512/stt

Voice Recognition to Text Tool / 一个离线运行的本地音视频转字幕工具，输出json、srt字幕、纯文字格式

2,676 (+32)

gpl-3.0

NVIDIA/DeepLearningExamples

State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.

13,686 (+29)

modelscope/FunClip

Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.

3,877 (+26)

mit

yanshengjia/ml-road

Machine Learning Resources, Practice and Research

3,541 (+24)

mit

espnet/espnet

End-to-End Speech Processing Toolkit

8,600 (+23)

apache-2.0

leon-ai/leon

🧠 Leon is your open-source personal assistant.

15,565 (+21)

mit

Last week (relative gain)

Evil0ctal/Fast-Powerful-Whisper-AI-Services-API

205 (+78%)

apache-2.0

echogarden-project/echogarden

248 (+18%)

gpl-3.0

salute-developers/GigaAM

Foundational Model for Speech Recognition Tasks

135 (+13%)

mit

thewh1teagle/pyannote-rs

pyannote audio diarization in rust

37 (+9%)

mit

VITA-MLLM/Freeze-Omni

✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM

210 (+7%)

AudioLLMs/AudioBench

AudioBench: A Universal Benchmark for Audio Large Language Models

103 (+3%)

abus-aikorea/voice-pro

2,344 (+3%)

mit

savbell/whisper-writer

💬📝 A small dictation app using OpenAI's Whisper speech recognition model.

384 (+2%)

gpl-3.0

nyrahealth/CrisperWhisper

Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection

470 (+2%)

FunAudioLLM/SenseVoice

Multilingual Voice Understanding Model

3,725 (+2%)

botbahlul/autosrt

A python script COMMAND LINE utility to AUTO GENERATE SUBTITLE FILE (using free Google Speech Recognition API) and TRANSLATED SUBTITLE FILE (using unofficial online Google Translate API) for any video...

53 (+2%)

mit

AlekPet/ComfyUI_Custom_Nodes_AlekPet

Custom nodes that extend the capabilities of Comfyui

942 (+2%)

mit

Thiagohgl/ai-pronunciation-trainer

This tool uses AI to evaluate your pronunciation.

169 (+2%)

agpl-3.0

aniemore/Aniemore

Emotions recognition from audio and text files (only russian language)

57 (+2%)

mit

QuantiusBenignus/blurt

Gnome shell extension for accurate speech to text input in Linux using whisper.cpp. Input text from speech anywhere.

57 (+2%)

gpl-3.0

HenestrosaDev/audiotext

A desktop application that transcribes audio from files, microphone input or YouTube videos with the option to translate the content and create subtitles.

174 (+2%)

Pikurrot/whisper-gui

A simple GUI to use Whisper.

116 (+2%)

mit

JosefAlbers/whisper-turbo-mlx

Blazing fast whisper turbo for ASR (speech-to-text) tasks

177 (+2%)

mit

khanld/ASR-Wav2vec-Finetune

:zap: Finetune Wa2vec 2.0 For Speech Recognition

120 (+2%)

vndee/local-talking-llm

A talking LLM that runs on your own computer without needing the internet.

315 (+2%)

mit

Last month (new repositories)

no newly created repositories trending in the last month

Last month (absolute gain)

abus-aikorea/voice-pro

2,344 (+1,603)

mit

huggingface/transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

136,424 (+1,332)

apache-2.0

ggerganov/whisper.cpp

Port of OpenAI's Whisper model in C/C++

36,355 (+643)

mit

SYSTRAN/faster-whisper

Faster Whisper transcription with CTranslate2

13,007 (+492)

mit

m-bain/whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

12,904 (+416)

bsd-2-clause

modelscope/FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

7,346 (+363)

FunAudioLLM/SenseVoice

Multilingual Voice Understanding Model

3,725 (+284)

nyrahealth/CrisperWhisper

Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection

470 (+208)

alphacep/vosk-api

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node

8,303 (+181)

apache-2.0

jianchang512/stt

Voice Recognition to Text Tool / 一个离线运行的本地音视频转字幕工具，输出json、srt字幕、纯文字格式

2,676 (+178)

gpl-3.0

openvinotoolkit/openvino

OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference

7,461 (+170)

apache-2.0

modelscope/FunClip

Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.

3,877 (+166)

mit

MahmoudAshraf97/whisper-diarization

Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper

3,876 (+166)

bsd-2-clause

Chenyme/Chenyme-AAVT

这是一个全自动（音频）视频翻译项目。利用Whisper识别声音，AI大模型翻译字幕，最后合并字幕视频，生成翻译后的视频。

1,922 (+148)

mit

speechbrain/speechbrain

A PyTorch-based Speech Toolkit

9,071 (+142)

apache-2.0

mozilla/DeepSpeech

DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.

25,483 (+119)

mpl-2.0

VITA-MLLM/Freeze-Omni

✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM

210 (+115)

NVIDIA/DeepLearningExamples

State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.

13,686 (+115)

PaddlePaddle/PaddleSpeech

11,267 (+110)

apache-2.0

argmaxinc/WhisperKit

On-device Speech Recognition for Apple Silicon

4,034 (+106)

mit

Last month (relative gain)

abus-aikorea/voice-pro

2,344 (+216%)

mit

VITA-MLLM/Freeze-Omni

✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM

210 (+121%)

Evil0ctal/Fast-Powerful-Whisper-AI-Services-API

205 (+103%)

apache-2.0

nyrahealth/CrisperWhisper

Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection

470 (+79%)

Igorcbraz/Calculadora

📐 Calculadora simples e intuitiva com suporte a comandos de voz e temas personalizados 📏

36 (+44%)

mit

opendilab/CleanS2S

High-quality and streaming Speech-to-Speech interactive agent in a single file. 只用一个文件实现的流式全双工语音交互原型智能体！

304 (+39%)

apache-2.0

echogarden-project/echogarden

248 (+27%)

gpl-3.0

thewh1teagle/pyannote-rs

pyannote audio diarization in rust

37 (+23%)

mit

thewh1teagle/sherpa-rs

Rust bindings to https://github.com/k2-fsa/sherpa-onnx

53 (+20%)

mit

salute-developers/GigaAM

Foundational Model for Speech Recognition Tasks

135 (+18%)

mit

j3soon/whisper-to-input

An Android keyboard that performs speech-to-text (STT/ASR) with OpenAI Whisper and input the recognized text; Supports English, Chinese, Japanese, etc. and even mixed languages.

50 (+14%)

vilassn/whisper_android

Offline Speech Recognition with OpenAI Whisper and TensorFlow Lite for Android

269 (+14%)

mit

AntoBrandi/Robotics-and-ROS-2-Learn-by-Doing-Manipulators

About 3D Printed robot arm powered by ROS 2 and Arduino and controlled via MoveIt! 2 and Amazon Alexa. It is developed and programmed in the online course named "Robotics and ROS 2 - Learn by Doing! M...

86 (+13%)

apache-2.0

morioka/tiny-openai-whisper-api

OpenAI Whisper API-style local server, runnig on FastAPI

70 (+13%)

mit

litongjava/whisper-cpp-server

whisper-cpp-serve Real-time speech recognition and c+ of OpenAI's Whisper model in C/C++

53 (+13%)

mit

Pikurrot/whisper-gui

A simple GUI to use Whisper.

116 (+13%)

mit

semperai/amica

Amica is an open source interface for interactive communication with 3D characters with voice synthesis and speech recognition.

808 (+12%)

mit

JosefAlbers/whisper-turbo-mlx

Blazing fast whisper turbo for ASR (speech-to-text) tasks

177 (+12%)

mit

AudioLLMs/AudioBench

AudioBench: A Universal Benchmark for Audio Large Language Models

103 (+12%)

Thiagohgl/ai-pronunciation-trainer

This tool uses AI to evaluate your pronunciation.

169 (+11%)

agpl-3.0

Last 12-months (new repositories)

argmaxinc/WhisperKit

On-device Speech Recognition for Apple Silicon

4,034

mit

FunAudioLLM/SenseVoice

Multilingual Voice Understanding Model

3,725

jianchang512/stt

Voice Recognition to Text Tool / 一个离线运行的本地音视频转字幕工具，输出json、srt字幕、纯文字格式

2,676

gpl-3.0

abus-aikorea/voice-pro

2,344

mit

Chenyme/Chenyme-AAVT

这是一个全自动（音频）视频翻译项目。利用Whisper识别声音，AI大模型翻译字幕，最后合并字幕视频，生成翻译后的视频。

1,922

mit

ictnlp/StreamSpeech

StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.

977

mit

alesaccoia/VoiceStreamAI

Near-Realtime audio transcription using self-hosted Whisper and WebSocket in Python/JS

757

mit

mezbaul-h/june

Local voice chatbot for engaging conversations, powered by Ollama, Hugging Face Transformers, and Coqui TTS Toolkit

725

mit

soupslurpr/Transcribro

Private and on-device speech recognition keyboard and service for Android.

471

isc

nyrahealth/CrisperWhisper

Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection

470

apeatling/ollama-voice-mac

Mac compatible Ollama Voice

440

agpl-3.0

revdotcom/reverb

Open source inference code for Rev's model

343

apache-2.0

vndee/local-talking-llm

A talking LLM that runs on your own computer without needing the internet.

315

mit

opendilab/CleanS2S

High-quality and streaming Speech-to-Speech interactive agent in a single file. 只用一个文件实现的流式全双工语音交互原型智能体！

304

apache-2.0

davidmartinrius/speech-dataset-generator

🔊 Create labeled datasets, enhance audio quality, identify speakers, support diverse dataset types. 🎧👥📊 Advanced audio processing.

213

mit

VITA-MLLM/Freeze-Omni

✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM

210

Evil0ctal/Fast-Powerful-Whisper-AI-Services-API

205

apache-2.0

MooreThreads/MooER

MooER: Moore-threads Open Omni model for speech-to-speech intERaction. MooER-omni includes a series of end-to-end speech interaction models along with training and inference code, covering but not lim...

177

JosefAlbers/whisper-turbo-mlx

Blazing fast whisper turbo for ASR (speech-to-text) tasks

177

mit

salute-developers/GigaAM

Foundational Model for Speech Recognition Tasks

135

mit

Last 12-months (absolute gain)

huggingface/transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

136,424 (+19,147)

apache-2.0

ggerganov/whisper.cpp

Port of OpenAI's Whisper model in C/C++

36,355 (+9,989)

mit

SYSTRAN/faster-whisper

Faster Whisper transcription with CTranslate2

13,007 (+6,789)

mit

m-bain/whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

12,904 (+5,867)

bsd-2-clause

modelscope/FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

7,346 (+5,777)

argmaxinc/WhisperKit

On-device Speech Recognition for Apple Silicon

4,034 (+3,960)

mit

modelscope/FunClip

Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.

3,877 (+3,768)

mit

FunAudioLLM/SenseVoice

Multilingual Voice Understanding Model

3,725 (+3,723)

jianchang512/stt

Voice Recognition to Text Tool / 一个离线运行的本地音视频转字幕工具，输出json、srt字幕、纯文字格式

2,676 (+2,667)

gpl-3.0

MahmoudAshraf97/whisper-diarization

Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper

3,876 (+2,504)

bsd-2-clause

openvinotoolkit/openvino

OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference

7,461 (+2,343)

apache-2.0

abus-aikorea/voice-pro

2,344 (+2,343)

mit

speechbrain/speechbrain

A PyTorch-based Speech Toolkit

9,071 (+2,108)

apache-2.0

mozilla/DeepSpeech

DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.

25,483 (+1,944)

mpl-2.0

Chenyme/Chenyme-AAVT

这是一个全自动（音频）视频翻译项目。利用Whisper识别声音，AI大模型翻译字幕，最后合并字幕视频，生成翻译后的视频。

1,922 (+1,921)

mit

PaddlePaddle/PaddleSpeech

11,267 (+1,910)

apache-2.0

NVIDIA/DeepLearningExamples

State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.

13,686 (+1,828)

alphacep/vosk-api

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node

8,303 (+1,815)

apache-2.0

leon-ai/leon

🧠 Leon is your open-source personal assistant.

15,565 (+1,634)

mit

pluja/whishper

Transcribe any audio to text, translate and edit subtitles 100% locally with a web UI. Powered by whisper models!

1,712 (+1,320)

agpl-3.0

Last 12-months (relative gain)

jianchang512/stt

Voice Recognition to Text Tool / 一个离线运行的本地音视频转字幕工具，输出json、srt字幕、纯文字格式

2,676 (+29,633%)

gpl-3.0

ictnlp/StreamSpeech

StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.

977 (+16,183%)

mit

soupslurpr/Transcribro

Private and on-device speech recognition keyboard and service for Android.

471 (+11,675%)

isc

judahpaul16/gpt-home

ChatGPT at home! Basically a better Google Nest Hub or Amazon Alexa home assistant. Built on the Raspberry Pi using the OpenAI API.

481 (+6,771%)

gpl-3.0

transcriptionstream/transcriptionstream

turnkey self-hosted offline transcription and diarization service with llm summary

758 (+5,731%)

gpl-3.0

argmaxinc/WhisperKit

On-device Speech Recognition for Apple Silicon

4,034 (+5,351%)

mit

modelscope/FunClip

Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.

3,877 (+3,457%)

mit

VITA-MLLM/Freeze-Omni

✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM

210 (+3,400%)

alesaccoia/VoiceStreamAI

Near-Realtime audio transcription using self-hosted Whisper and WebSocket in Python/JS

757 (+3,054%)

mit

paulovcmedeiros/pyRobBot

Chat with GPT LLMs over voice, UI & terminal, all with access to the internet. Powered by OpenAI.

118 (+1,586%)

mit

AudioLLMs/AudioBench

AudioBench: A Universal Benchmark for Audio Large Language Models

103 (+1,371%)

j3soon/whisper-to-input

An Android keyboard that performs speech-to-text (STT/ASR) with OpenAI Whisper and input the recognized text; Supports English, Chinese, Japanese, etc. and even mixed languages.

50 (+1,150%)

ptsochantaris/emeltal

Local ML voice chat using high-end models.

152 (+913%)

mit

alxpez/alts

100% free, local & offline voice assistant with speech recognition

60 (+900%)

vilassn/whisper_android

Offline Speech Recognition with OpenAI Whisper and TensorFlow Lite for Android

269 (+896%)

mit

shashikg/WhisperS2T

An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Engine

330 (+871%)

mit

inboxpraveen/LLM-Minutes-of-Meeting

🎤📄 An innovative tool that transforms audio or video files into text transcripts and generates concise meeting minutes. Stay organized and efficient in your meetings, and get ready for Phase 2 where...

116 (+867%)

mit

misyaguziya/VRCT

VRCT(VRChat Chatbox Translator & Transcription)

114 (+850%)

mit

kyegomez/zeta

Build high-performance AI models with modular building blocks

442 (+840%)

apache-2.0

litongjava/whisper-cpp-server

whisper-cpp-serve Real-time speech recognition and c+ of OpenAI's Whisper model in C/C++

53 (+657%)

mit