Search Results - RepositoryStats

211

2.2k

apache-2.0

26

⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡

rag 4-bits gaudi3 habana chatbot chatpdf llm-cpu autoround retrieval neural-chat streamingllm llm-inference neural-chat-7b large-language-model speculative-decoding intel-optimized-llamacpp

This repository has been archived (exclude archived)

Created 2022-11-11

2,126 commits to main branch, last one 6 months ago

aphrodite-engine aphrodite-engine

150

1.4k

agpl-3.0

21

Large-scale LLM inference engine

tpu cuda lora rocm intel api-rest inferentia inference-engine machine-learning speculative-decoding

Created 2023-06-23

1,256 commits to main branch, last one 3 days ago

EAGLE SafeAILab

129

1.2k

apache-2.0

23

Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3.

llm-inference speculative-decoding large-language-models

Created 2023-12-07

306 commits to main branch, last one 6 days ago

Sequoia Infini-AI-Lab

37

341

unknown

5

scalable and robust tree-based speculative decoding algorithm

llm inference efficiency speculative-decoding

Created 2024-02-29

79 commits to main branch, last one 2 months ago

LayerSkip facebookresearch

23

284

other

9

Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024

llm early-exit layer-drop optimization transformers speculative-decoding

Created 2024-02-26

48 commits to main branch, last one 2 months ago

TriForce Infini-AI-Lab

18

246

unknown

2

[COLM 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding

llm inference efficiency acceleration long-context llm-inference speculative-decoding

Created 2024-04-04

17 commits to main branch, last one 7 months ago

REST FasterDecoding

12

199

apache-2.0

7

REST: Retrieval-Based Speculative Decoding, NAACL 2024

retrieval llm-inference speculative-decoding

Created 2023-11-15

11 commits to main branch, last one 4 months ago

UMbreLLa Infini-AI-Lab

15

105

apache-2.0

4

LLM Inference on consumer devices

offloading llm-inference speculative-decoding

Created 2024-12-25

149 commits to v0.1.0 branch, last one 27 days ago

BigLittleDecoder kssteven418

10

90

apache-2.0

5

[NeurIPS'23] Speculative Decoding with Big Little Decoder

llm decoding fast-inference efficient-inference speculative-decoding speculative-execution

Created 2023-02-10

11,217 commits to main branch, last one about a year ago

TokenSwift bigai-nlco

9

86

mit

1

From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation

llms qwen deepseek inference llm-serving transformer llm-inference speculative-decoding

Created 2025-02-06

57 commits to main branch, last one 25 days ago

Speculative-Decoding romsto

8

46

mit

2

Implementation of the paper Fast Inference from Transformers via Speculative Decoding, Leviathan et al. 2023.

llm llm-inference fast-inference llm-optimization speculative-decoding

Created 2024-04-22

26 commits to main branch, last one 4 months ago

SWIFT hemingkx

1

45

apache-2.0

3

[ICLR 2025] SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration

speculative-decoding

Created 2024-10-09

12 commits to main branch, last one about a month ago

SpecDec hemingkx

1

39

unknown

2

Codes for our paper "Speculative Decoding: Exploiting Speculative Execution for Accelerating Seq2seq Generation" (EMNLP 2023 Findings)

non-autoregressive speculative-decoding

Created 2022-03-31

41 commits to main branch, last one about a year ago

PipeInfer AutonomicPerfectionist

4

29

mit

3

PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation

llm llamacpp inference speculative-decoding

Created 2024-04-09

1,641 commits to main branch, last one 8 months ago