Trending repositories for topic audio-processing
Cross-platform, customizable ML solutions for live and streaming media.
InspireMusic: A Unified Framework for Music, Song, Audio Generation.
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
An implementation of the system-wide JamesDSP audio processing engine for non-rooted Android devices
This project is a Python bot that automates the process of logging into Gmail, joining a Google Meet, recording the audio of the meeting, and then generating a summary, key points, action items, and s...
A C++ based, lightweight music and noise remover for YouTube and other internet media, using DeepFilterNet for audio enhancement.
Collection of resources on the applications of Large Language Models (LLMs) in Audio AI.
🔊 Create labeled datasets, enhance audio quality, identify speakers, support diverse dataset types. 🎧👥📊 Advanced audio processing.
FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds. AI拟音大师,给你的无声视频添加生动而且同步的音效 😝
High-quality pro audio resampler / sample rate conversion C++ library. Very fast, for both audio resampling and time-series interpolation.
PipeWire Guide. Learn about how PipeWire gives your Linux system a Professional Audio/Video Processing workflow.
Fast, modern C++ DSP framework, FFT, Sample Rate Conversion, FIR/IIR/Biquad Filters (SSE, AVX, AVX-512, ARM NEON)
This project is a Python bot that automates the process of logging into Gmail, joining a Google Meet, recording the audio of the meeting, and then generating a summary, key points, action items, and s...
InspireMusic: A Unified Framework for Music, Song, Audio Generation.
Command-line VST3, AU and LADSPA plugin host for easier debugging of audio plugins
A Web and Native UI for ffmpeg-wasm: convert video, audio and images using the power of ffmpeg, directly from your web browser or from your computer.
🔊 Create labeled datasets, enhance audio quality, identify speakers, support diverse dataset types. 🎧👥📊 Advanced audio processing.
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
STFT based real-time pitch and timbre shifting in C++ and Python
A C++ based, lightweight music and noise remover for YouTube and other internet media, using DeepFilterNet for audio enhancement.
Collection of resources on the applications of Large Language Models (LLMs) in Audio AI.
Flutter low-level audio plugin using SoLoud C++ library and FFI
An implementation of the system-wide JamesDSP audio processing engine for non-rooted Android devices
FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds. AI拟音大师,给你的无声视频添加生动而且同步的音效 😝
Android Voice Activity Detection (VAD) library. Supports WebRTC VAD GMM, Silero VAD DNN, Yamnet VAD DNN models.
Experimental code: sound file preprocessing to optimize Whisper transcriptions without hallucinated texts
High-quality pro audio resampler / sample rate conversion C++ library. Very fast, for both audio resampling and time-series interpolation.
Code and Pretrained Models for Interspeech 2023 Paper "Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong Audio Event Taggers"
Cross-platform, customizable ML solutions for live and streaming media.
InspireMusic: A Unified Framework for Music, Song, Audio Generation.
A C++ based, lightweight music and noise remover for YouTube and other internet media, using DeepFilterNet for audio enhancement.
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
An implementation of the system-wide JamesDSP audio processing engine for non-rooted Android devices
LedFx is a network based LED effect engine designed to deliver advanced real-time audio effects to a wide variety of devices.
🔊 Create labeled datasets, enhance audio quality, identify speakers, support diverse dataset types. 🎧👥📊 Advanced audio processing.
Collection of resources on the applications of Large Language Models (LLMs) in Audio AI.
This project is a Python bot that automates the process of logging into Gmail, joining a Google Meet, recording the audio of the meeting, and then generating a summary, key points, action items, and s...
InspireMusic: A Unified Framework for Music, Song, Audio Generation.
A C++ based, lightweight music and noise remover for YouTube and other internet media, using DeepFilterNet for audio enhancement.
🔊 Create labeled datasets, enhance audio quality, identify speakers, support diverse dataset types. 🎧👥📊 Advanced audio processing.
Easily train a good VC model with voice data <= 10 mins!
All the jargon you need to understand the world of Digital Signal Processing.
Command-line VST3, AU and LADSPA plugin host for easier debugging of audio plugins
A Web and Native UI for ffmpeg-wasm: convert video, audio and images using the power of ffmpeg, directly from your web browser or from your computer.
Code, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand".
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
Cross-platform, customizable ML solutions for live and streaming media.
A C++ based, lightweight music and noise remover for YouTube and other internet media, using DeepFilterNet for audio enhancement.
Arcan - [Display Server, Multimedia Framework, Game Engine] -> "Desktop Engine"
An implementation of the system-wide JamesDSP audio processing engine for non-rooted Android devices
InspireMusic: A Unified Framework for Music, Song, Audio Generation.
A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
LedFx is a network based LED effect engine designed to deliver advanced real-time audio effects to a wide variety of devices.
PipeWire Guide. Learn about how PipeWire gives your Linux system a Professional Audio/Video Processing workflow.
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
The collection of pre-trained, state-of-the-art AI models for ailia SDK
A C++ based, lightweight music and noise remover for YouTube and other internet media, using DeepFilterNet for audio enhancement.
InspireMusic: A Unified Framework for Music, Song, Audio Generation.
Audio time stretch and pitch shift library. Enables music tempo adjustment, transposition, "smooth scrub" and "live pause".
A complete, cross-platform solution to record, convert, filter and stream audio and video.
Example effects code and binaries for the Cleveland Music Co. Hothouse Digital Signal Processing Pedal Kit
This project is a Python bot that automates the process of logging into Gmail, joining a Google Meet, recording the audio of the meeting, and then generating a summary, key points, action items, and s...
Easily train a good VC model with voice data <= 10 mins!
Real time audio to audio translation over sockets. With virtual microphones, you can use this in any video conferencing software you'd like!
Here you get to see free AI Learning Materials across the world
This free tool transforms your books, textbooks, or any text document into fantastic sounding audiobooks using OpenAI's state-of-the-art TTS technology.
[EMNLP2024 Demo], [ICASSP 2025] A user-friendly library for reproducible video moment retrieval and highlight detection. It also supports audio moment retrieval.
Single- and Multi-Speaker Cloned Voice Detection: From Perceptual to Learned Features
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
AI Productivity Tool - Free and open-source, enhancing user productivity while ensuring privacy and data security. Offers efficient and convenient AI solutions, including but not limited to: built-in ...
FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds. AI拟音大师,给你的无声视频添加生动而且同步的音效 😝
A C++ based, lightweight music and noise remover for YouTube and other internet media, using DeepFilterNet for audio enhancement.
InspireMusic: A Unified Framework for Music, Song, Audio Generation.
🔊 Create labeled datasets, enhance audio quality, identify speakers, support diverse dataset types. 🎧👥📊 Advanced audio processing.
Easily train a good VC model with voice data <= 10 mins!
an architecture for neural network inference in real-time audio applications
Audio time stretch and pitch shift library. Enables music tempo adjustment, transposition, "smooth scrub" and "live pause".
Official code for SEE-2-SOUND: Zero-Shot Spatial Environment-to-Spatial Sound
WhisperClip simplifies your life by automatically transcribing audio recordings and saving the text directly to your clipboard. With just a click of a button, you can effortlessly convert spoken words...
[EMNLP2024 Demo], [ICASSP 2025] A user-friendly library for reproducible video moment retrieval and highlight detection. It also supports audio moment retrieval.
Cross-platform, customizable ML solutions for live and streaming media.
A library for audio and music analysis, feature extraction.
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
AI Productivity Tool - Free and open-source, enhancing user productivity while ensuring privacy and data security. Offers efficient and convenient AI solutions, including but not limited to: built-in ...
FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds. AI拟音大师,给你的无声视频添加生动而且同步的音效 😝
A C++ based, lightweight music and noise remover for YouTube and other internet media, using DeepFilterNet for audio enhancement.
A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
PipeWire Guide. Learn about how PipeWire gives your Linux system a Professional Audio/Video Processing workflow.
Isolate vocals, drums, bass, and other instrumental stems from any song
The collection of pre-trained, state-of-the-art AI models for ailia SDK
An implementation of the system-wide JamesDSP audio processing engine for non-rooted Android devices
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
Audio time stretch and pitch shift library. Enables music tempo adjustment, transposition, "smooth scrub" and "live pause".
[EMNLP2024 Demo], [ICASSP 2025] A user-friendly library for reproducible video moment retrieval and highlight detection. It also supports audio moment retrieval.
Official code for SEE-2-SOUND: Zero-Shot Spatial Environment-to-Spatial Sound
A music theory library in Rust for generating songs🎶
AI-powered YouTube Notes Generator: Create detailed notes from YouTube videos. Streamlit UI for easy use.
Easily train a good VC model with voice data <= 10 mins!
A Web and Native UI for ffmpeg-wasm: convert video, audio and images using the power of ffmpeg, directly from your web browser or from your computer.
Digital Multi-Effect Pedal with Reverb, Delay, Tremolo, Looper, and Neural Networks for Amp Modeling
A collection amazing audio tools for working with audio and sound files in comfyUI