11 results found Sort:

⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡
This repository has been archived (exclude archived)
Created 2022-11-11
2,126 commits to main branch, last one about a month ago
Large-scale LLM inference engine
Created 2023-06-23
825 commits to main branch, last one 23 hours ago
83
826
apache-2.0
13
Official Implementation of EAGLE-1 (ICML'24) and EAGLE-2 (EMNLP'24)
Created 2023-12-07
276 commits to main branch, last one 4 days ago
34
316
unknown
5
scalable and robust tree-based speculative decoding algorithm
Created 2024-02-29
77 commits to main branch, last one 3 months ago
Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024
Created 2024-02-26
26 commits to main branch, last one 24 days ago
[COLM 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
Created 2024-04-04
17 commits to main branch, last one 2 months ago
11
176
apache-2.0
5
REST: Retrieval-Based Speculative Decoding, NAACL 2024
Created 2023-11-15
9 commits to main branch, last one a day ago
[NeurIPS'23] Speculative Decoding with Big Little Decoder
Created 2023-02-10
11,217 commits to main branch, last one 9 months ago
0
33
unknown
2
Codes for our paper "Speculative Decoding: Exploiting Speculative Execution for Accelerating Seq2seq Generation" (EMNLP 2023 Findings)
Created 2022-03-31
41 commits to main branch, last one 11 months ago
Implementation of the paper Fast Inference from Transformers via Speculative Decoding, Leviathan et al. 2023.
Created 2024-04-22
24 commits to main branch, last one 24 days ago
1
26
apache-2.0
3
SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration
Created 2024-10-09
10 commits to main branch, last one about a month ago