Trending repositories for topic voice-activity-detection

Last 3 days (new repositories)

no newly created repositories trending in the last 3 days

Last 3 days (absolute gain)

modelscope/FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

9,488 (+150)

snakers4/silero-vad

Silero VAD: pre-trained enterprise-grade Voice Activity Detector

5,517 (+35)

mit

pyannote/pyannote-audio

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

7,215 (+24)

mit

smacke/ffsubsync

Automagically synchronize subtitles with video.

7,094 (+8)

mit

k2-fsa/sherpa-ncnn

Real-time speech recognition and voice activity detection (VAD) using next-gen Kaldi with ncnn without Internet connection. Support iOS, Android, Linux, macOS, Windows, Raspberry Pi, VisionFive2, Lich...

1,260 (+4)

apache-2.0

juanmc2005/diart

A python package to build AI-powered real-time audio applications

1,234 (+3)

mit

jim-schwoebel/voice_datasets

🔊 A comprehensive list of open-source datasets for voice and sound computing (95+ datasets).

1,891 (+3)

mgonzs13/whisper_ros

Speech-to-Text based on SileroVAD + whisper.cpp (GGML Whisper) for ROS 2

73 (+1)

mit

gkonovalov/android-vad

Android Voice Activity Detection (VAD) library. Supports WebRTC VAD GMM, Silero VAD DNN, Yamnet VAD DNN models.

323 (+1)

mit

noisetorch/NoiseTorch

Real-time microphone noise suppression on Linux.

9,573 (+1)

Last 3 days (relative gain)

modelscope/FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

9,488 (+2%)

mgonzs13/whisper_ros

Speech-to-Text based on SileroVAD + whisper.cpp (GGML Whisper) for ROS 2

73 (+1%)

mit

snakers4/silero-vad

Silero VAD: pre-trained enterprise-grade Voice Activity Detector

5,517 (+0.6%)

mit

pyannote/pyannote-audio

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

7,215 (+0.3%)

mit

k2-fsa/sherpa-ncnn

1,260 (+0.3%)

apache-2.0

gkonovalov/android-vad

Android Voice Activity Detection (VAD) library. Supports WebRTC VAD GMM, Silero VAD DNN, Yamnet VAD DNN models.

323 (+0.3%)

mit

juanmc2005/diart

A python package to build AI-powered real-time audio applications

1,234 (+0.2%)

mit

jim-schwoebel/voice_datasets

🔊 A comprehensive list of open-source datasets for voice and sound computing (95+ datasets).

1,891 (+0.2%)

smacke/ffsubsync

Automagically synchronize subtitles with video.

7,094 (+0.1%)

mit

noisetorch/NoiseTorch

Real-time microphone noise suppression on Linux.

9,573 (+0.0%)

Last week (new repositories)

no newly created repositories trending in the last week

Last week (absolute gain)

modelscope/FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

9,488 (+204)

snakers4/silero-vad

Silero VAD: pre-trained enterprise-grade Voice Activity Detector

5,517 (+86)

mit

pyannote/pyannote-audio

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

7,215 (+45)

mit

smacke/ffsubsync

Automagically synchronize subtitles with video.

7,094 (+12)

mit

k2-fsa/sherpa-ncnn

1,260 (+10)

apache-2.0

juanmc2005/diart

A python package to build AI-powered real-time audio applications

1,234 (+8)

mit

jim-schwoebel/voice_datasets

🔊 A comprehensive list of open-source datasets for voice and sound computing (95+ datasets).

1,891 (+8)

shashikg/WhisperS2T

An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Engine

386 (+5)

mit

mgonzs13/whisper_ros

Speech-to-Text based on SileroVAD + whisper.cpp (GGML Whisper) for ROS 2

73 (+3)

mit

coqui-ai/open-speech-corpora

💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies

1,322 (+3)

mit

gkonovalov/android-vad

Android Voice Activity Detection (VAD) library. Supports WebRTC VAD GMM, Silero VAD DNN, Yamnet VAD DNN models.

323 (+2)

mit

duj12/ASR-2Pass

ASR 2Pass onnxruntime and websocket server, based on FunASR(https://github.com/alibaba-damo-academy/FunASR).

66 (+1)

noisetorch/NoiseTorch

Real-time microphone noise suppression on Linux.

9,573 (+1)

Last week (relative gain)

mgonzs13/whisper_ros

Speech-to-Text based on SileroVAD + whisper.cpp (GGML Whisper) for ROS 2

73 (+4%)

mit

modelscope/FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

9,488 (+2%)

snakers4/silero-vad

Silero VAD: pre-trained enterprise-grade Voice Activity Detector

5,517 (+2%)

mit

duj12/ASR-2Pass

ASR 2Pass onnxruntime and websocket server, based on FunASR(https://github.com/alibaba-damo-academy/FunASR).

66 (+2%)

shashikg/WhisperS2T

An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Engine

386 (+1%)

mit

k2-fsa/sherpa-ncnn

1,260 (+0.8%)

apache-2.0

juanmc2005/diart

A python package to build AI-powered real-time audio applications

1,234 (+0.7%)

mit

pyannote/pyannote-audio

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

7,215 (+0.6%)

mit

gkonovalov/android-vad

Android Voice Activity Detection (VAD) library. Supports WebRTC VAD GMM, Silero VAD DNN, Yamnet VAD DNN models.

323 (+0.6%)

mit

jim-schwoebel/voice_datasets

🔊 A comprehensive list of open-source datasets for voice and sound computing (95+ datasets).

1,891 (+0.4%)

coqui-ai/open-speech-corpora

💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies

1,322 (+0.2%)

mit

smacke/ffsubsync

Automagically synchronize subtitles with video.

7,094 (+0.2%)

mit

noisetorch/NoiseTorch

Real-time microphone noise suppression on Linux.

9,573 (+0.0%)

Last month (new repositories)

no newly created repositories trending in the last month

Last month (absolute gain)

modelscope/FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

9,488 (+784)

snakers4/silero-vad

Silero VAD: pre-trained enterprise-grade Voice Activity Detector

5,517 (+318)

mit

pyannote/pyannote-audio

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

7,215 (+212)

mit

smacke/ffsubsync

Automagically synchronize subtitles with video.

7,094 (+51)

mit

noisetorch/NoiseTorch

Real-time microphone noise suppression on Linux.

9,573 (+43)

k2-fsa/sherpa-ncnn

1,260 (+42)

apache-2.0

jim-schwoebel/voice_datasets

🔊 A comprehensive list of open-source datasets for voice and sound computing (95+ datasets).

1,891 (+38)

juanmc2005/diart

A python package to build AI-powered real-time audio applications

1,234 (+25)

mit

shashikg/WhisperS2T

An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Engine

386 (+18)

mit

gkonovalov/android-vad

Android Voice Activity Detection (VAD) library. Supports WebRTC VAD GMM, Silero VAD DNN, Yamnet VAD DNN models.

323 (+15)

mit

bigcash/awesome-vad

A curated list of awesome voice activity detection

48 (+7)

apache-2.0

coqui-ai/open-speech-corpora

💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies

1,322 (+7)

mit

ina-foss/inaSpeechSegmenter

CNN-based audio segmentation toolkit. Allows to detect speech, music, noise and speaker gender. Has been designed for large scale gender equality studies based on speech time per gender.

794 (+6)

mit

mgonzs13/whisper_ros

Speech-to-Text based on SileroVAD + whisper.cpp (GGML Whisper) for ROS 2

73 (+5)

mit

duj12/ASR-2Pass

ASR 2Pass onnxruntime and websocket server, based on FunASR(https://github.com/alibaba-damo-academy/FunASR).

66 (+3)

zhenghuatan/rVADfast

This is the Python library for an unsupervised, fast method for robust voice activity detection (rVAD), as in the paper rVAD: An Unsupervised Segment-Based Robust Voice Activity Detection Method.

137 (+3)

mit

Picovoice/cobra

On-device voice activity detection (VAD) powered by deep learning

204 (+3)

apache-2.0

Speech-Interaction-Technology-Aalto-U/itsp

Introduction to Speech Processing

85 (+2)

cc-by-sa-4.0

gtreshchev/RuntimeAudioImporter

Runtime Audio Importer plugin for Unreal Engine. Importing audio of various formats at runtime.

380 (+2)

mit

baxtree/subaligner

Automatically synchronize and translate subtitles, or create new ones by transcribing, using pre-trained DNNs, Forced Alignments and Transformers. https://subaligner.readthedocs.io/

467 (+2)

mit

Last month (relative gain)

bigcash/awesome-vad

A curated list of awesome voice activity detection

48 (+17%)

apache-2.0

modelscope/FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

9,488 (+9%)

mgonzs13/whisper_ros

Speech-to-Text based on SileroVAD + whisper.cpp (GGML Whisper) for ROS 2

73 (+7%)

mit

snakers4/silero-vad

Silero VAD: pre-trained enterprise-grade Voice Activity Detector

5,517 (+6%)

mit

shashikg/WhisperS2T

An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Engine

386 (+5%)

mit

gkonovalov/android-vad

Android Voice Activity Detection (VAD) library. Supports WebRTC VAD GMM, Silero VAD DNN, Yamnet VAD DNN models.

323 (+5%)

mit

duj12/ASR-2Pass

ASR 2Pass onnxruntime and websocket server, based on FunASR(https://github.com/alibaba-damo-academy/FunASR).

66 (+5%)

k2-fsa/sherpa-ncnn

1,260 (+3%)

apache-2.0

pyannote/pyannote-audio

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

7,215 (+3%)

mit

Speech-Interaction-Technology-Aalto-U/itsp

Introduction to Speech Processing

85 (+2%)

cc-by-sa-4.0

zhenghuatan/rVADfast

This is the Python library for an unsupervised, fast method for robust voice activity detection (rVAD), as in the paper rVAD: An Unsupervised Segment-Based Robust Voice Activity Detection Method.

137 (+2%)

mit

juanmc2005/diart

A python package to build AI-powered real-time audio applications

1,234 (+2%)

mit

jim-schwoebel/voice_datasets

🔊 A comprehensive list of open-source datasets for voice and sound computing (95+ datasets).

1,891 (+2%)

bunyaminergen/Callytics

Callytics is an advanced call analytics solution that leverages speech recognition and large language models (LLMs) technologies to analyze phone conversations from customer service and call centers.

65 (+2%)

gpl-3.0

Picovoice/cobra

On-device voice activity detection (VAD) powered by deep learning

204 (+1%)

apache-2.0

ina-foss/inaSpeechSegmenter

CNN-based audio segmentation toolkit. Allows to detect speech, music, noise and speaker gender. Has been designed for large scale gender equality studies based on speech time per gender.

794 (+0.8%)

mit

smacke/ffsubsync

Automagically synchronize subtitles with video.

7,094 (+0.7%)

mit

coqui-ai/open-speech-corpora

💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies

1,322 (+0.5%)

mit

gtreshchev/RuntimeAudioImporter

Runtime Audio Importer plugin for Unreal Engine. Importing audio of various formats at runtime.

380 (+0.5%)

mit

noisetorch/NoiseTorch

Real-time microphone noise suppression on Linux.

9,573 (+0.5%)

Last 12-months (new repositories)

bunyaminergen/Callytics

Callytics is an advanced call analytics solution that leverages speech recognition and large language models (LLMs) technologies to analyze phone conversations from customer service and call centers.

gpl-3.0

Last 12-months (absolute gain)

modelscope/FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

9,488 (+6,488)

snakers4/silero-vad

Silero VAD: pre-trained enterprise-grade Voice Activity Detector

5,517 (+2,796)

mit

pyannote/pyannote-audio

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

7,215 (+2,303)

mit

noisetorch/NoiseTorch

Real-time microphone noise suppression on Linux.

9,573 (+639)

smacke/ffsubsync

Automagically synchronize subtitles with video.

7,094 (+622)

mit

k2-fsa/sherpa-ncnn

1,260 (+468)

apache-2.0

juanmc2005/diart

A python package to build AI-powered real-time audio applications

1,234 (+454)

mit

jim-schwoebel/voice_datasets

🔊 A comprehensive list of open-source datasets for voice and sound computing (95+ datasets).

1,891 (+371)

shashikg/WhisperS2T

An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Engine

386 (+221)

mit

gkonovalov/android-vad

Android Voice Activity Detection (VAD) library. Supports WebRTC VAD GMM, Silero VAD DNN, Yamnet VAD DNN models.

323 (+139)

mit

coqui-ai/open-speech-corpora

💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies

1,322 (+120)

mit

ggeop/Python-ai-assistant

Python AI assistant 🧠

964 (+112)

mit

ina-foss/inaSpeechSegmenter

CNN-based audio segmentation toolkit. Allows to detect speech, music, noise and speaker gender. Has been designed for large scale gender equality studies based on speech time per gender.

794 (+102)

mit

gtreshchev/RuntimeAudioImporter

Runtime Audio Importer plugin for Unreal Engine. Importing audio of various formats at runtime.

380 (+93)

mit

amsehili/auditok

An audio/acoustic activity detection and audio segmentation tool

769 (+65)

mit

Picovoice/cobra

On-device voice activity detection (VAD) powered by deep learning

204 (+63)

apache-2.0

bunyaminergen/Callytics

Callytics is an advanced call analytics solution that leverages speech recognition and large language models (LLMs) technologies to analyze phone conversations from customer service and call centers.

65 (+58)

gpl-3.0

baxtree/subaligner

Automatically synchronize and translate subtitles, or create new ones by transcribing, using pre-trained DNNs, Forced Alignments and Transformers. https://subaligner.readthedocs.io/

467 (+57)

mit

mgonzs13/whisper_ros

Speech-to-Text based on SileroVAD + whisper.cpp (GGML Whisper) for ROS 2

73 (+54)

mit

duj12/ASR-2Pass

ASR 2Pass onnxruntime and websocket server, based on FunASR(https://github.com/alibaba-damo-academy/FunASR).

66 (+46)

Last 12-months (relative gain)

bunyaminergen/Callytics

Callytics is an advanced call analytics solution that leverages speech recognition and large language models (LLMs) technologies to analyze phone conversations from customer service and call centers.

65 (+829%)

gpl-3.0

bigcash/awesome-vad

A curated list of awesome voice activity detection

48 (+336%)

apache-2.0

mgonzs13/whisper_ros

Speech-to-Text based on SileroVAD + whisper.cpp (GGML Whisper) for ROS 2

73 (+284%)

mit

duj12/ASR-2Pass

ASR 2Pass onnxruntime and websocket server, based on FunASR(https://github.com/alibaba-damo-academy/FunASR).

66 (+230%)

modelscope/FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

9,488 (+216%)

shashikg/WhisperS2T

An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Engine

386 (+134%)

mit

nianlonggu/WhisperSeg

Code for ICASSP 2024 paper WhisperSeg: Positive Transfer of the Whisper Speech Transformer to Human and Animal Voice Activity Detection

29 (+123%)

snakers4/silero-vad

Silero VAD: pre-trained enterprise-grade Voice Activity Detector

5,517 (+103%)

mit

Speech-Interaction-Technology-Aalto-U/itsp

Introduction to Speech Processing

85 (+93%)

cc-by-sa-4.0

gkonovalov/android-vad

Android Voice Activity Detection (VAD) library. Supports WebRTC VAD GMM, Silero VAD DNN, Yamnet VAD DNN models.

323 (+76%)

mit

k2-fsa/sherpa-ncnn

1,260 (+59%)

apache-2.0

juanmc2005/diart

A python package to build AI-powered real-time audio applications

1,234 (+58%)

mit

pyannote/pyannote-audio

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

7,215 (+47%)

mit

Picovoice/cobra

On-device voice activity detection (VAD) powered by deep learning

204 (+45%)

apache-2.0

gtreshchev/RuntimeAudioImporter

Runtime Audio Importer plugin for Unreal Engine. Importing audio of various formats at runtime.

380 (+32%)

mit

jim-schwoebel/voice_datasets

🔊 A comprehensive list of open-source datasets for voice and sound computing (95+ datasets).

1,891 (+24%)

tomchang25/whisper-auto-transcribe

Auto transcribe tool based on whisper

225 (+17%)

mit

zhenghuatan/rVADfast

This is the Python library for an unsupervised, fast method for robust voice activity detection (rVAD), as in the paper rVAD: An Unsupervised Segment-Based Robust Voice Activity Detection Method.

137 (+15%)

mit

ina-foss/inaSpeechSegmenter

CNN-based audio segmentation toolkit. Allows to detect speech, music, noise and speaker gender. Has been designed for large scale gender equality studies based on speech time per gender.

794 (+15%)

mit

NickWilkinson37/voxseg

A python library for voice activity detection (VAD) for speech/non-speech segmentation.

87 (+14%)

mit