Search Results - RepositoryStats

35

263

apache-2.0

5

Flux diffusion model implementation using quantized fp8 matmul & remaining layers use faster half precision accumulate, which is ~2x faster on consumer devices.

fp8 flux pytorch diffusion quantization fast-inference

Created 2024-08-05

64 commits to main branch, last one 6 months ago

BigLittleDecoder kssteven418

10

90

apache-2.0

5

[NeurIPS'23] Speculative Decoding with Big Little Decoder

llm decoding fast-inference efficient-inference speculative-decoding speculative-execution

Created 2023-02-10

11,217 commits to main branch, last one about a year ago

Q-LLM dvlab-research

3

48

unknown

0

This is the official repo of "QuickLLaMA: Query-aware Inference Acceleration for Large Language Models"

long-context fast-inference kv-cache-compression large-language-models inference-acceleration

Created 2024-06-11

8 commits to master branch, last one 9 months ago

Speculative-Decoding romsto

9

47

mit

2

Implementation of the paper Fast Inference from Transformers via Speculative Decoding, Leviathan et al. 2023.

llm llm-inference fast-inference llm-optimization speculative-decoding

Created 2024-04-22

26 commits to main branch, last one 4 months ago