11 results found Sort:
- Filter by Primary Language:
- Python (10)
- C (1)
- +
⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡
This repository has been archived
(exclude archived)
Created
2022-11-11
2,126 commits to main branch, last one about a month ago
Large-scale LLM inference engine
Created
2023-06-23
825 commits to main branch, last one 23 hours ago
Official Implementation of EAGLE-1 (ICML'24) and EAGLE-2 (EMNLP'24)
Created
2023-12-07
276 commits to main branch, last one 4 days ago
scalable and robust tree-based speculative decoding algorithm
Created
2024-02-29
77 commits to main branch, last one 3 months ago
Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024
Created
2024-02-26
26 commits to main branch, last one 24 days ago
[COLM 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
Created
2024-04-04
17 commits to main branch, last one 2 months ago
REST: Retrieval-Based Speculative Decoding, NAACL 2024
Created
2023-11-15
9 commits to main branch, last one a day ago
[NeurIPS'23] Speculative Decoding with Big Little Decoder
Created
2023-02-10
11,217 commits to main branch, last one 9 months ago
Codes for our paper "Speculative Decoding: Exploiting Speculative Execution for Accelerating Seq2seq Generation" (EMNLP 2023 Findings)
Created
2022-03-31
41 commits to main branch, last one 11 months ago
Implementation of the paper Fast Inference from Transformers via Speculative Decoding, Leviathan et al. 2023.
Created
2024-04-22
24 commits to main branch, last one 24 days ago
SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration
Created
2024-10-09
10 commits to main branch, last one about a month ago