Statistics for topic audio
RepositoryStats tracks 595,858 Github repositories, of these 1,510 are tagged with the audio topic. The most common primary language for repositories using this topic is Python (291). Other languages include: C++ (226), JavaScript (162), C (132), TypeScript (93), Rust (79), C# (77), Swift (47), Go (45), Java (41)
Stargazers over time for topic audio
Most starred repositories for topic audio (view more)
Trending repositories for topic audio (view more)
TTSAudioNormalizer is a specialized tool for TTS data production, featuring descriptive statistical analysis of audio loudness and loudness normalization operations.
[arXiv 2024] Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
Nexa SDK is a comprehensive toolkit for supporting GGML and ONNX models. It supports text generation, image generation, vision-language models (VLM), Audio Language Model, auto-speech-recognition (ASR...
GUI for a Vocal Remover that uses Deep Neural Networks.
TTSAudioNormalizer is a specialized tool for TTS data production, featuring descriptive statistical analysis of audio loudness and loudness normalization operations.
TTSAudioNormalizer is a specialized tool for TTS data production, featuring descriptive statistical analysis of audio loudness and loudness normalization operations.
[arXiv 2024] Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
Run OpenAudible from a container and use it with a web browser
React library for audio recording and visualization using the Web Audio API
TTSAudioNormalizer is a specialized tool for TTS data production, featuring descriptive statistical analysis of audio loudness and loudness normalization operations.
[arXiv 2024] Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
Nexa SDK is a comprehensive toolkit for supporting GGML and ONNX models. It supports text generation, image generation, vision-language models (VLM), Audio Language Model, auto-speech-recognition (ASR...
GUI for a Vocal Remover that uses Deep Neural Networks.
[arXiv 2024] Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
TTSAudioNormalizer is a specialized tool for TTS data production, featuring descriptive statistical analysis of audio loudness and loudness normalization operations.
An AirPlay Audio-Receiver for your Personal Computer or ARM-SoC (e.g. Raspberry Pi)
[arXiv 2024] Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
TTSAudioNormalizer is a specialized tool for TTS data production, featuring descriptive statistical analysis of audio loudness and loudness normalization operations.
Nexa SDK is a comprehensive toolkit for supporting GGML and ONNX models. It supports text generation, image generation, vision-language models (VLM), Audio Language Model, auto-speech-recognition (ASR...
[arXiv 2024] Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
GUI for a Vocal Remover that uses Deep Neural Networks.
[arXiv 2024] Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
Audio metadata reading and writing library for Android/JVM platforms.
Demux media files in the browser using WebAssembly, designed for WebCodecs 在浏览器中实现媒体文件的解封装,专为WebCodecs设计
Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection
TTSAudioNormalizer is a specialized tool for TTS data production, featuring descriptive statistical analysis of audio loudness and loudness normalization operations.
Nexa SDK is a comprehensive toolkit for supporting GGML and ONNX models. It supports text generation, image generation, vision-language models (VLM), Audio Language Model, auto-speech-recognition (ASR...
[arXiv 2024] Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection
GUI for a Vocal Remover that uses Deep Neural Networks.
Nexa SDK is a comprehensive toolkit for supporting GGML and ONNX models. It supports text generation, image generation, vision-language models (VLM), Audio Language Model, auto-speech-recognition (ASR...
The free and privacy-friendly screen recorder with no limits 🎥
[arXiv 2024] Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
[SLT'24] The official implementation of SSAMBA: Self-Supervised Audio Representation Learning with Mamba State Space Model