Trending repositories for topic audio
[arXiv 2024] Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
GUI for a Vocal Remover that uses Deep Neural Networks.
The free and privacy-friendly screen recorder with no limits 🎥
SRS is a simple, high-efficiency, real-time media server supporting RTMP, WebRTC, HLS, HTTP-FLV, HTTP-TS, SRT, MPEG-DASH, and GB28181.
Automated Apple Music Lossless Sample Rate Switching for Audio Devices on Macs.
Audio Share can share Windows/Linux computer's audio to Android phone over network, so your phone becomes the speaker of computer. (You needn't buy a new speaker😄.)
Background Music, a macOS audio utility: automatically pause your music, set individual apps' volumes and record system audio.
Mixxx is Free DJ software that gives you everything you need to perform live mixes.
Nexa SDK is a comprehensive toolkit for supporting GGML and ONNX models. It supports text generation, image generation, vision-language models (VLM), Audio Language Model, auto-speech-recognition (ASR...
An implementation of the system-wide JamesDSP audio processing engine for non-rooted Android devices
Audio playback and capture library written in C, in a single source file.
BlackHole is a modern macOS audio loopback driver that allows applications to pass audio to other applications with zero additional latency.
[arXiv 2024] Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
Versatile AI-driven audio upscaler to enhance the quality of any audio.
React library for audio recording and visualization using the Web Audio API
AAAI 2025: Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
SimVQ: Addressing Representation Collapse in Vector Quantized Models with One Linear Layer
Multithreaded TIDAL Media Downloader Next Generation! Up to HiRes Lossless / TIDAL MAX 24-bit, 192 kHz.
Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection
Audio Share can share Windows/Linux computer's audio to Android phone over network, so your phone becomes the speaker of computer. (You needn't buy a new speaker😄.)
An audio visualizer for React. Provides separate components to visualize both live audio and audio blobs.
HyMPS will be a platform-indipendent software suite for advanced audio/video contents production.
MilkDrop 3.0, supports any audio source, double-preset (.milk2), loading presets based on beat detection and much more...
[arXiv 2024] Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
GUI for a Vocal Remover that uses Deep Neural Networks.
Automated Apple Music Lossless Sample Rate Switching for Audio Devices on Macs.
Nexa SDK is a comprehensive toolkit for supporting GGML and ONNX models. It supports text generation, image generation, vision-language models (VLM), Audio Language Model, auto-speech-recognition (ASR...
SRS is a simple, high-efficiency, real-time media server supporting RTMP, WebRTC, HLS, HTTP-FLV, HTTP-TS, SRT, MPEG-DASH, and GB28181.
Mixxx is Free DJ software that gives you everything you need to perform live mixes.
The free and privacy-friendly screen recorder with no limits 🎥
BlackHole is a modern macOS audio loopback driver that allows applications to pass audio to other applications with zero additional latency.
Background Music, a macOS audio utility: automatically pause your music, set individual apps' volumes and record system audio.
Audio playback and capture library written in C, in a single source file.
Audio Share can share Windows/Linux computer's audio to Android phone over network, so your phone becomes the speaker of computer. (You needn't buy a new speaker😄.)
[arXiv 2024] Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
An AirPlay Audio-Receiver for your Personal Computer or ARM-SoC (e.g. Raspberry Pi)
VideoAlchemy is a toolkit expanding video processing capabilities, emphasizing FFmpeg and broader video technology applications.
Automated Apple Music Lossless Sample Rate Switching for Audio Devices on Macs.
SimVQ: Addressing Representation Collapse in Vector Quantized Models with One Linear Layer
Desktop app for recording meetings from locally running apps and transcribing and summarizing them with a local LLM
This free tool transforms your books, textbooks, or any text document into fantastic sounding audiobooks using OpenAI's state-of-the-art TTS technology.
Ultimate Vocal Remover 5 with Gradio UI. Separate an audio file into various stems, using multiple models
We'll look into audio categorization using deep learning principles like Artificial Neural Networks (ANN), 1D Convolutional Neural Networks (CNN1D), and CNN2D in this repository. We undertake some bas...
TTSAudioNormalizer is a specialized tool for TTS data production, featuring descriptive statistical analysis of audio loudness and loudness normalization operations.
Comprehensive monorepo designed to facilitate real-time audio processing and streaming across iOS, Android, and web platforms.
Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection
Demux media files in the browser using WebAssembly, designed for WebCodecs 在浏览器中实现媒体文件的解封装,专为WebCodecs设计
[arXiv 2024] Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
TTSAudioNormalizer is a specialized tool for TTS data production, featuring descriptive statistical analysis of audio loudness and loudness normalization operations.
[arXiv 2024] Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
Nexa SDK is a comprehensive toolkit for supporting GGML and ONNX models. It supports text generation, image generation, vision-language models (VLM), Audio Language Model, auto-speech-recognition (ASR...
GUI for a Vocal Remover that uses Deep Neural Networks.
SRS is a simple, high-efficiency, real-time media server supporting RTMP, WebRTC, HLS, HTTP-FLV, HTTP-TS, SRT, MPEG-DASH, and GB28181.
BlackHole is a modern macOS audio loopback driver that allows applications to pass audio to other applications with zero additional latency.
The free and privacy-friendly screen recorder with no limits 🎥
Background Music, a macOS audio utility: automatically pause your music, set individual apps' volumes and record system audio.
MediaCMS is a modern, fully featured open source video and media CMS, written in Python/Django and React, featuring a REST API.
Audio Share can share Windows/Linux computer's audio to Android phone over network, so your phone becomes the speaker of computer. (You needn't buy a new speaker😄.)
Custom elements (web components) for making audio and video player controls that look great in your website or app.
UI components and hooks for building video/audio players on the web. Robust, customizable, and accessible. Modern alternative to JW Player and Video.js.
[arXiv 2024] Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
Demux media files in the browser using WebAssembly, designed for WebCodecs 在浏览器中实现媒体文件的解封装,专为WebCodecs设计
TTSAudioNormalizer is a specialized tool for TTS data production, featuring descriptive statistical analysis of audio loudness and loudness normalization operations.
An AirPlay Audio-Receiver for your Personal Computer or ARM-SoC (e.g. Raspberry Pi)
Comprehensive monorepo designed to facilitate real-time audio processing and streaming across iOS, Android, and web platforms.
VideoAlchemy is a toolkit expanding video processing capabilities, emphasizing FFmpeg and broader video technology applications.
SimVQ: Addressing Representation Collapse in Vector Quantized Models with One Linear Layer
Repository for the paper "Combining audio control and style transfer using latent diffusion", accepted at ISMIR 2024
Multithreaded TIDAL Media Downloader Next Generation! Up to HiRes Lossless / TIDAL MAX 24-bit, 192 kHz.
Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection
Desktop app for recording meetings from locally running apps and transcribing and summarizing them with a local LLM
Nexa SDK is a comprehensive toolkit for supporting GGML and ONNX models. It supports text generation, image generation, vision-language models (VLM), Audio Language Model, auto-speech-recognition (ASR...
Nexa SDK is a comprehensive toolkit for supporting GGML and ONNX models. It supports text generation, image generation, vision-language models (VLM), Audio Language Model, auto-speech-recognition (ASR...
[arXiv 2024] Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection
Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate
Radient turns many data types (not just text) into vectors for similarity search, RAG, regression analysis, and more.
Ultimate Vocal Remover 5 with Gradio UI. Separate an audio file into various stems, using multiple models
A simple and modern audio flyout for Windows 10/11, built with Fluent 2 Design principles.
Open source podcast instrument for Android supporting contents from YouTube and YT Music as well as normal podcasts.
An all-in-one sound and music management addon for the Godot game engine.
SimVQ: Addressing Representation Collapse in Vector Quantized Models with One Linear Layer
an architecture for neural network inference in real-time audio applications
Official Implementation of the work "Audio Mamba: Bidirectional State Space Model for Audio Representation Learning"
AAAI 2025: Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
整理(索引) Web 音视频相关的 API、SDK、文章、对外产品,帮助前端开发者入门/进阶音视频领域,推动音视频技术在 Web 平台的应用实践。
[EMNLP2024 Demo], [ICASSP 2025] A user-friendly library for reproducible video moment retrieval and highlight detection. It also supports audio moment retrieval.
[SLT'24] The official implementation of SSAMBA: Self-Supervised Audio Representation Learning with Mamba State Space Model
GUI for a Vocal Remover that uses Deep Neural Networks.
Nexa SDK is a comprehensive toolkit for supporting GGML and ONNX models. It supports text generation, image generation, vision-language models (VLM), Audio Language Model, auto-speech-recognition (ASR...
The free and privacy-friendly screen recorder with no limits 🎥
Set app volumes with real sliders! deej is an Arduino & Go project to let you build your own hardware mixer for Windows and Linux
BlackHole is a modern macOS audio loopback driver that allows applications to pass audio to other applications with zero additional latency.
SRS is a simple, high-efficiency, real-time media server supporting RTMP, WebRTC, HLS, HTTP-FLV, HTTP-TS, SRT, MPEG-DASH, and GB28181.
Background Music, a macOS audio utility: automatically pause your music, set individual apps' volumes and record system audio.
💿 Free software that works great, and also happens to be open-source Python.
UI components and hooks for building video/audio players on the web. Robust, customizable, and accessible. Modern alternative to JW Player and Video.js.
[arXiv 2024] Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
Multithreaded TIDAL Media Downloader Next Generation! Up to HiRes Lossless / TIDAL MAX 24-bit, 192 kHz.
[SLT'24] The official implementation of SSAMBA: Self-Supervised Audio Representation Learning with Mamba State Space Model
Official Implementation of the work "Audio Mamba: Bidirectional State Space Model for Audio Representation Learning"
[EMNLP2024 Demo], [ICASSP 2025] A user-friendly library for reproducible video moment retrieval and highlight detection. It also supports audio moment retrieval.
App that will record system audio and send it off to the Shazam API to be identified. For when your phone's microphone just can't quite capture the song well enough for Shazam to figure it out.
Media server for real-time, low latency, programmable video and audio mixing.
Implementation of the model "AudioFlamingo" from the paper: "Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities"
A Web and Native UI for ffmpeg-wasm: convert video, audio and images using the power of ffmpeg, directly from your web browser or from your computer.