14 results found Sort:

⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡
This repository has been archived (exclude archived)
Created 2022-11-11
2,126 commits to main branch, last one 6 months ago
Large-scale LLM inference engine
Created 2023-06-23
1,256 commits to main branch, last one 3 days ago
129
1.2k
apache-2.0
23
Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3.
Created 2023-12-07
306 commits to main branch, last one 6 days ago
37
341
unknown
5
scalable and robust tree-based speculative decoding algorithm
Created 2024-02-29
79 commits to main branch, last one 2 months ago
Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024
Created 2024-02-26
48 commits to main branch, last one 2 months ago
[COLM 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
Created 2024-04-04
17 commits to main branch, last one 7 months ago
12
199
apache-2.0
7
REST: Retrieval-Based Speculative Decoding, NAACL 2024
Created 2023-11-15
11 commits to main branch, last one 4 months ago
15
105
apache-2.0
4
LLM Inference on consumer devices
Created 2024-12-25
149 commits to v0.1.0 branch, last one 27 days ago
[NeurIPS'23] Speculative Decoding with Big Little Decoder
Created 2023-02-10
11,217 commits to main branch, last one about a year ago
From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation
Created 2025-02-06
57 commits to main branch, last one 25 days ago
Implementation of the paper Fast Inference from Transformers via Speculative Decoding, Leviathan et al. 2023.
Created 2024-04-22
26 commits to main branch, last one 4 months ago
1
45
apache-2.0
3
[ICLR 2025] SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration
Created 2024-10-09
12 commits to main branch, last one about a month ago
1
39
unknown
2
Codes for our paper "Speculative Decoding: Exploiting Speculative Execution for Accelerating Seq2seq Generation" (EMNLP 2023 Findings)
Created 2022-03-31
41 commits to main branch, last one about a year ago
PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation
Created 2024-04-09
1,641 commits to main branch, last one 8 months ago