4 results found Sort:
Flux diffusion model implementation using quantized fp8 matmul & remaining layers use faster half precision accumulate, which is ~2x faster on consumer devices.
Created
2024-08-05
64 commits to main branch, last one about a month ago
[NeurIPS'23] Speculative Decoding with Big Little Decoder
Created
2023-02-10
11,217 commits to main branch, last one 10 months ago
This is the official repo of "QuickLLaMA: Query-aware Inference Acceleration for Large Language Models"
Created
2024-06-11
8 commits to master branch, last one 4 months ago
Implementation of the paper Fast Inference from Transformers via Speculative Decoding, Leviathan et al. 2023.
Created
2024-04-22
26 commits to main branch, last one a day ago