14 results found Sort:
- Filter by Primary Language:
- Python (11)
- C++ (2)
- C (1)
- +
⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡
This repository has been archived
(exclude archived)
Created
2022-11-11
2,126 commits to main branch, last one 6 months ago
Large-scale LLM inference engine
Created
2023-06-23
1,256 commits to main branch, last one 3 days ago
Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3.
Created
2023-12-07
306 commits to main branch, last one 6 days ago
scalable and robust tree-based speculative decoding algorithm
Created
2024-02-29
79 commits to main branch, last one 2 months ago
Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024
Created
2024-02-26
48 commits to main branch, last one 2 months ago
[COLM 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
Created
2024-04-04
17 commits to main branch, last one 7 months ago
REST: Retrieval-Based Speculative Decoding, NAACL 2024
Created
2023-11-15
11 commits to main branch, last one 4 months ago
LLM Inference on consumer devices
Created
2024-12-25
149 commits to v0.1.0 branch, last one 27 days ago
[NeurIPS'23] Speculative Decoding with Big Little Decoder
Created
2023-02-10
11,217 commits to main branch, last one about a year ago
From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation
Created
2025-02-06
57 commits to main branch, last one 25 days ago
Implementation of the paper Fast Inference from Transformers via Speculative Decoding, Leviathan et al. 2023.
Created
2024-04-22
26 commits to main branch, last one 4 months ago
[ICLR 2025] SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration
Created
2024-10-09
12 commits to main branch, last one about a month ago
Codes for our paper "Speculative Decoding: Exploiting Speculative Execution for Accelerating Seq2seq Generation" (EMNLP 2023 Findings)
Created
2022-03-31
41 commits to main branch, last one about a year ago
PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation
Created
2024-04-09
1,641 commits to main branch, last one 8 months ago