Trending repositories for topic audio
Graph-oriented live coding language and music/audio DSP library written in Rust
GUI for a Vocal Remover that uses Deep Neural Networks.
The free and privacy-friendly screen recorder with no limits 🎥
SRS is a simple, high-efficiency, real-time media server supporting RTMP, WebRTC, HLS, HTTP-FLV, HTTP-TS, SRT, MPEG-DASH, and GB28181.
Background Music, a macOS audio utility: automatically pause your music, set individual apps' volumes and record system audio.
UI components and hooks for building video/audio players on the web. Robust, customizable, and accessible. Modern alternative to JW Player and Video.js.
A fully fledged audio module created for music apps. Provides audio playback, external media controls, background mode and more!
[arXiv 2024] Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
BlackHole is a modern macOS audio loopback driver that allows applications to pass audio to other applications with zero additional latency.
JUCE is an open-source cross-platform C++ application framework for desktop and mobile applications, including VST, VST3, AU, AUv3, LV2 and AAX audio plug-ins.
Awesome Python resources related to audio and music
An AirPlay Audio-Receiver for your Personal Computer or ARM-SoC (e.g. Raspberry Pi)
A Python package for building Quantum Representations of Digital Audio. Developed by Moth.
[IJCAI 2024] EAT: Self-Supervised Pre-Training with Efficient Audio Transformer
Cross-platform audio recorder designed for real-time speech audio processing
Comprehensive monorepo designed to facilitate real-time audio processing and streaming across iOS, Android, and web platforms.
Code for the paper: GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities
Play/Stream/Record PCM audio data & Encode/Decode Opus to PCM audio data
[arXiv 2024] Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
Graph-oriented live coding language and music/audio DSP library written in Rust
GUI for a Vocal Remover that uses Deep Neural Networks.
The free and privacy-friendly screen recorder with no limits 🎥
Background Music, a macOS audio utility: automatically pause your music, set individual apps' volumes and record system audio.
SRS is a simple, high-efficiency, real-time media server supporting RTMP, WebRTC, HLS, HTTP-FLV, HTTP-TS, SRT, MPEG-DASH, and GB28181.
[arXiv 2024] Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
UI components and hooks for building video/audio players on the web. Robust, customizable, and accessible. Modern alternative to JW Player and Video.js.
BlackHole is a modern macOS audio loopback driver that allows applications to pass audio to other applications with zero additional latency.
JUCE is an open-source cross-platform C++ application framework for desktop and mobile applications, including VST, VST3, AU, AUv3, LV2 and AAX audio plug-ins.
Awesome Python resources related to audio and music
A low-level audio recorder plugin which uses miniaudio as backend and supporting all the platforms. It can detect silence and save to WAV audio file. Audio wave and FFT data can be get in real-time as...
An AirPlay Audio-Receiver for your Personal Computer or ARM-SoC (e.g. Raspberry Pi)
Comprehensive monorepo designed to facilitate real-time audio processing and streaming across iOS, Android, and web platforms.
VideoAlchemy is a toolkit expanding video processing capabilities, emphasizing FFmpeg and broader video technology applications.
Versatile AI-driven audio upscaler to enhance the quality of any audio.
Graph-oriented live coding language and music/audio DSP library written in Rust
[arXiv 2024] Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
A Python package for building Quantum Representations of Digital Audio. Developed by Moth.
Customizable midi visualization software kinda like Synthesia for Windows
Multithreaded TIDAL Media Downloader Next Generation! Up to HiRes Lossless / TIDAL MAX 24-bit, 192 kHz.
[arXiv 2024] Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
GUI for a Vocal Remover that uses Deep Neural Networks.
The free and privacy-friendly screen recorder with no limits 🎥
SRS is a simple, high-efficiency, real-time media server supporting RTMP, WebRTC, HLS, HTTP-FLV, HTTP-TS, SRT, MPEG-DASH, and GB28181.
BlackHole is a modern macOS audio loopback driver that allows applications to pass audio to other applications with zero additional latency.
Background Music, a macOS audio utility: automatically pause your music, set individual apps' volumes and record system audio.
Graph-oriented live coding language and music/audio DSP library written in Rust
UI components and hooks for building video/audio players on the web. Robust, customizable, and accessible. Modern alternative to JW Player and Video.js.
Audio playback and capture library written in C, in a single source file.
JUCE is an open-source cross-platform C++ application framework for desktop and mobile applications, including VST, VST3, AU, AUv3, LV2 and AAX audio plug-ins.
A minimalistic player library for Kotlin Multiplatform. It targets Android, jvm and iOS, allowing consumers to play audio files
[arXiv 2024] Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
An AirPlay Audio-Receiver for your Personal Computer or ARM-SoC (e.g. Raspberry Pi)
TTSAudioNormalizer is a specialized tool for TTS data production, featuring descriptive statistical analysis of audio loudness and loudness normalization operations.
VideoAlchemy is a toolkit expanding video processing capabilities, emphasizing FFmpeg and broader video technology applications.
Awesome Python resources related to audio and music
A low-level audio recorder plugin which uses miniaudio as backend and supporting all the platforms. It can detect silence and save to WAV audio file. Audio wave and FFT data can be get in real-time as...
Versatile AI-driven audio upscaler to enhance the quality of any audio.
Comprehensive monorepo designed to facilitate real-time audio processing and streaming across iOS, Android, and web platforms.
A Python package for building Quantum Representations of Digital Audio. Developed by Moth.
AAAI 2025: Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
A complete, cross-platform solution to record, convert, filter and stream audio and video.
Nexa SDK is a comprehensive toolkit for supporting GGML and ONNX models. It supports text generation, image generation, vision-language models (VLM), Audio Language Model, auto-speech-recognition (ASR...
[arXiv 2024] Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection
Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate
Radient turns many data types (not just text) into vectors for similarity search, RAG, regression analysis, and more.
Ultimate Vocal Remover 5 with Gradio UI. Separate an audio file into various stems, using multiple models
A simple and modern audio flyout for Windows 11, built with Fluent 2 Design principles.
Open source podcast instrument for Android supporting contents from YouTube and YT Music as well as normal podcasts.
An all-in-one sound and music management addon for the Godot game engine.
SimVQ: Addressing Representation Collapse in Vector Quantized Models with One Linear Layer
an architecture for neural network inference in real-time audio applications
AAAI 2025: Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
整理(索引) Web 音视频相关的 API、SDK、文章、对外产品,帮助前端开发者入门/进阶音视频领域,推动音视频技术在 Web 平台的应用实践。
Official Implementation of the work "Audio Mamba: Bidirectional State Space Model for Audio Representation Learning"
[EMNLP2024 Demo], [ICASSP 2025] A user-friendly library for reproducible video moment retrieval and highlight detection. It also supports audio moment retrieval.
GUI for a Vocal Remover that uses Deep Neural Networks.
Nexa SDK is a comprehensive toolkit for supporting GGML and ONNX models. It supports text generation, image generation, vision-language models (VLM), Audio Language Model, auto-speech-recognition (ASR...
The free and privacy-friendly screen recorder with no limits 🎥
Set app volumes with real sliders! deej is an Arduino & Go project to let you build your own hardware mixer for Windows and Linux
BlackHole is a modern macOS audio loopback driver that allows applications to pass audio to other applications with zero additional latency.
SRS is a simple, high-efficiency, real-time media server supporting RTMP, WebRTC, HLS, HTTP-FLV, HTTP-TS, SRT, MPEG-DASH, and GB28181.
Background Music, a macOS audio utility: automatically pause your music, set individual apps' volumes and record system audio.
💿 Free software that works great, and also happens to be open-source Python.
UI components and hooks for building video/audio players on the web. Robust, customizable, and accessible. Modern alternative to JW Player and Video.js.
[arXiv 2024] Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
Multithreaded TIDAL Media Downloader Next Generation! Up to HiRes Lossless / TIDAL MAX 24-bit, 192 kHz.
[SLT'24] The official implementation of SSAMBA: Self-Supervised Audio Representation Learning with Mamba State Space Model
[IJCAI 2024] EAT: Self-Supervised Pre-Training with Efficient Audio Transformer
light-weight CLI music player with Soundcloud & Youtube built-in. Effects, Themes, Midi Support for Win & Linux
Official Implementation of the work "Audio Mamba: Bidirectional State Space Model for Audio Representation Learning"
[EMNLP2024 Demo], [ICASSP 2025] A user-friendly library for reproducible video moment retrieval and highlight detection. It also supports audio moment retrieval.
App that will record system audio and send it off to the Shazam API to be identified. For when your phone's microphone just can't quite capture the song well enough for Shazam to figure it out.
Media server for real-time, low latency, programmable video and audio mixing.
Implementation of the model "AudioFlamingo" from the paper: "Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities"