Trending repositories for topic audio
GUI for a Vocal Remover that uses Deep Neural Networks.
🔊 High-precision web player for multi-device audio playback and spatial audio.
SRS is a simple, high-efficiency, real-time media server supporting RTMP, WebRTC, HLS, HTTP-FLV, HTTP-TS, SRT, MPEG-DASH, and GB28181.
Mixxx is Free DJ software that gives you everything you need to perform live mixes.
An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.
The free and privacy-friendly screen recorder with no limits 🎥
JUCE is an open-source cross-platform C++ application framework for desktop and mobile applications, including VST, VST3, AU, AUv3, LV2 and AAX audio plug-ins.
BlackHole is a modern macOS audio loopback driver that allows applications to pass audio to other applications with zero additional latency.
A lightweight yet powerful audio-to-MIDI converter with pitch bend detection
Mumble is an open-source, low-latency, high quality voice chat software.
[CVPR 2025] MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
A real-time audio effect processor designed for audio enthusiasts to enhance their music listening experience.
Add a virtual speaker and mic to your windows 10/11 device! Works with VR, OBS, Sunshine, and/or any desktop sharing software.
🔊 High-precision web player for multi-device audio playback and spatial audio.
TTSAudioNormalizer is a specialized tool for TTS data production, featuring descriptive statistical analysis of audio loudness and loudness normalization operations.
A simple and modern audio flyout for Windows 11, built with Fluent 2 Design principles.
A Rust library for streaming audio, video, and other media content
A native-PyTorch library for large scale M-LLM (text/audio) training with tp/cp/dp/pp.
AAAI 2025: Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
A single file, easy to use, audio playback and synthesis framework
An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.
Comprehensive monorepo designed to facilitate real-time audio processing and streaming across iOS, Android, and web platforms.
A flexible cross-platform IIR and FIR engine for crossovers, room correction etc.
GUI for a Vocal Remover that uses Deep Neural Networks.
🔊 High-precision web player for multi-device audio playback and spatial audio.
Mixxx is Free DJ software that gives you everything you need to perform live mixes.
SRS is a simple, high-efficiency, real-time media server supporting RTMP, WebRTC, HLS, HTTP-FLV, HTTP-TS, SRT, MPEG-DASH, and GB28181.
An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.
BlackHole is a modern macOS audio loopback driver that allows applications to pass audio to other applications with zero additional latency.
The free and privacy-friendly screen recorder with no limits 🎥
JUCE is an open-source cross-platform C++ application framework for desktop and mobile applications, including VST, VST3, AU, AUv3, LV2 and AAX audio plug-ins.
Background Music, a macOS audio utility: automatically pause your music, set individual apps' volumes and record system audio.
Full Featured Video/Audio Downloader for Android using yt-dlp
A lightweight yet powerful audio-to-MIDI converter with pitch bend detection
A real-time audio effect processor designed for audio enthusiasts to enhance their music listening experience.
Add a virtual speaker and mic to your windows 10/11 device! Works with VR, OBS, Sunshine, and/or any desktop sharing software.
🔊 High-precision web player for multi-device audio playback and spatial audio.
Customizable midi visualization software kinda like Synthesia for Windows (Wine-compatible on Linux)
Comprehensive monorepo designed to facilitate real-time audio processing and streaming across iOS, Android, and web platforms.
Audio Analytics Dashboard that provides insights and eliminates tedious tasks in the music production workflow [Plotly, Streamlit, Librosa, Essentia]
A Web and Native UI for ffmpeg-wasm: convert video, audio and images using the power of ffmpeg, directly from your web browser or from your computer.
[IJCAI 2024] EAT: Self-Supervised Pre-Training with Efficient Audio Transformer
Demux media files in the browser using WebAssembly, designed for WebCodecs 在浏览器中实现媒体文件的解封装,专为WebCodecs设计
A simple and modern audio flyout for Windows 11, built with Fluent 2 Design principles.
A low-level audio recorder plugin which uses miniaudio as backend and supporting all the platforms. It can detect silence and save to WAV audio file. Audio wave and FFT data can be get in real-time as...
TTSAudioNormalizer is a specialized tool for TTS data production, featuring descriptive statistical analysis of audio loudness and loudness normalization operations.
🔊 High-precision web player for multi-device audio playback and spatial audio.
GUI for a Vocal Remover that uses Deep Neural Networks.
BlackHole is a modern macOS audio loopback driver that allows applications to pass audio to other applications with zero additional latency.
[CVPR 2025] MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.
SRS is a simple, high-efficiency, real-time media server supporting RTMP, WebRTC, HLS, HTTP-FLV, HTTP-TS, SRT, MPEG-DASH, and GB28181.
Background Music, a macOS audio utility: automatically pause your music, set individual apps' volumes and record system audio.
Mixxx is Free DJ software that gives you everything you need to perform live mixes.
The free and privacy-friendly screen recorder with no limits 🎥
Full Featured Video/Audio Downloader for Android using yt-dlp
JUCE is an open-source cross-platform C++ application framework for desktop and mobile applications, including VST, VST3, AU, AUv3, LV2 and AAX audio plug-ins.
Mumble is an open-source, low-latency, high quality voice chat software.
Add a virtual speaker and mic to your windows 10/11 device! Works with VR, OBS, Sunshine, and/or any desktop sharing software.
A real-time audio effect processor designed for audio enthusiasts to enhance their music listening experience.
ConvertIt is an ad-free Android app for converting audio and video files to various formats like FLAC, ALAC, MP3, WAV, AAC, OGG, M4A, AIFF, OPUS, WMA, MKA, and SPX. Built with Kotlin, Compose, and FFm...
An open source chat bot architecture for voice/vision (and multimodal) assistants, local(CPU/GPU bound) and remote(I/O bound) to run.
Customizable midi visualization software kinda like Synthesia for Windows (Wine-compatible on Linux)
Comprehensive monorepo designed to facilitate real-time audio processing and streaming across iOS, Android, and web platforms.
A safe and ergonomic Rust interface for FFmpeg integration, designed for ease of use.
[CVPR 2025] MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
Download Tidal tracks, videos, albums, playlists & artists! Tidal downloader that supports master quality.
A low-level audio recorder plugin which uses miniaudio as backend and supporting all the platforms. It can detect silence and save to WAV audio file. Audio wave and FFT data can be get in real-time as...
Nexa SDK is a comprehensive toolkit for supporting GGML and ONNX models. It supports text generation, image generation, vision-language models (VLM), Audio Language Model, auto-speech-recognition (ASR...
An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.
🔊 High-precision web player for multi-device audio playback and spatial audio.
[CVPR 2025] MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection
Ultimate Vocal Remover 5 with Gradio UI. Separate an audio file into various stems, using multiple models
A simple and modern audio flyout for Windows 11, built with Fluent 2 Design principles.
SimVQ: Addressing Representation Collapse in Vector Quantized Models with One Linear Layer
AAAI 2025: Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
A safe and ergonomic Rust interface for FFmpeg integration, designed for ease of use.
[EMNLP2024 Demo], [ICASSP 2025] A user-friendly library for reproducible video moment retrieval and highlight detection. It also supports audio moment retrieval.
Official Implementation of the work "Audio Mamba: Bidirectional State Space Model for Audio Representation Learning"
GUI for a Vocal Remover that uses Deep Neural Networks.
Nexa SDK is a comprehensive toolkit for supporting GGML and ONNX models. It supports text generation, image generation, vision-language models (VLM), Audio Language Model, auto-speech-recognition (ASR...
The free and privacy-friendly screen recorder with no limits 🎥
An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.
Full Featured Video/Audio Downloader for Android using yt-dlp
SRS is a simple, high-efficiency, real-time media server supporting RTMP, WebRTC, HLS, HTTP-FLV, HTTP-TS, SRT, MPEG-DASH, and GB28181.
BlackHole is a modern macOS audio loopback driver that allows applications to pass audio to other applications with zero additional latency.
Background Music, a macOS audio utility: automatically pause your music, set individual apps' volumes and record system audio.
🔊 High-precision web player for multi-device audio playback and spatial audio.
[CVPR 2025] MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
Mixxx is Free DJ software that gives you everything you need to perform live mixes.
[CVPR 2025] MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
[SLT'24] The official implementation of SSAMBA: Self-Supervised Audio Representation Learning with Mamba State Space Model
VideoAlchemy is a toolkit expanding video processing capabilities, emphasizing FFmpeg and broader video technology applications.
[EMNLP2024 Demo], [ICASSP 2025] A user-friendly library for reproducible video moment retrieval and highlight detection. It also supports audio moment retrieval.
Official Implementation of the work "Audio Mamba: Bidirectional State Space Model for Audio Representation Learning"
YUP is an open-source library dedicated to empowering developers with advanced tools for cross-platform application development.
It includes papers on speech&audio field. Now update: ICLR2023-2025, ICML2023-2024, NeurIPS2023-2024, ACMMM2024, AAAI2024, ACL2024, EMNLP2024, NAACL2025, AAAI2025, IJCAI2024
Multithreaded TIDAL Media Downloader Next Generation! Up to HiRes Lossless / TIDAL MAX 24-bit, 192 kHz.
Desktop app for recording meetings from locally running apps and transcribing and summarizing them with a local LLM
A Python package for building Quantum Representations of Digital Audio. Developed by Moth.
Android Music Streaming App suite in Material You style. Connects to Ampache, Nextcloud Music and compatible backends (Ampache API 4 and above).
整理(索引) Web 音视频相关的 API、SDK、文章、对外产品,帮助前端开发者入门/进阶音视频领域,推动音视频技术在 Web 平台的应用实践。