3 results found Sort:

8
146
apache-2.0
4
[NeurIPS 2024] AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising
Created 2024-05-31
63 commits to main branch, last one 3 days ago
10
126
apache-2.0
1
⚡️ A fast and flexible PyTorch inference server that runs locally, on any cloud or AI HW.
Created 2023-04-16
358 commits to main branch, last one 4 months ago
This is the official repo of "QuickLLaMA: Query-aware Inference Acceleration for Large Language Models"
Created 2024-06-11
8 commits to master branch, last one 2 months ago