Trending repositories for topic llm-inference

Last 3 days (new repositories)

no newly created repositories trending in the last 3 days

Last 3 days (absolute gain)

PySpur-Dev/PySpur

Graph-Based Editor for LLM Workflows

768 (+433)

apache-2.0

microsoft/autogen

A programming framework for agentic AI 🤖 PyPi: autogen-agentchat Discord: https://aka.ms/autogen-discord Office Hour: https://aka.ms/autogen-officehour

36,146 (+117)

cc-by-4.0

liguodongiot/llm-action

本项目旨在分享大模型相关技术原理以及实战经验（大模型工程化、大模型应用落地）

12,037 (+97)

apache-2.0

nomic-ai/gpt4all

GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.

71,262 (+52)

mit

flashinfer-ai/flashinfer

FlashInfer: Kernel Library for LLM Serving

1,597 (+33)

apache-2.0

felladrin/awesome-ai-web-search

A list of software that allows searching the web with the assistance of AI.

586 (+31)

cc0-1.0

Infini-AI-Lab/MagicPIG

MagicPIG: LSH Sampling for Efficient LLM Generation

159 (+22)

apache-2.0

andrewkchan/yalm

Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O

144 (+19)

InternLM/lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

4,887 (+19)

apache-2.0

jose-mdz/groq-chrome-ext

Chrome extension that interacts with content using Groq

29 (+18)

Lightning-AI/litgpt

20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.

10,954 (+18)

apache-2.0

katanemo/archgw

Arch is an intelligent gateway for agents. Engineered with (fast) LLMs for the secure handling, rich observability, and seamless integration of prompts with your APIs - all outside business logic. Bui...

1,021 (+17)

apache-2.0

codelion/optillm

Optimizing inference proxy for LLMs

1,774 (+16)

apache-2.0

bentoml/OpenLLM

Run any open-source LLMs, such as Llama, Mistral, as OpenAI compatible API endpoint in the cloud.

10,266 (+14)

apache-2.0

character-ai/prompt-poet

Streamlines and simplifies prompt design for both developers and non-technical users with a low code approach.

972 (+13)

mit

DefTruth/Awesome-LLM-Inference

📖A curated list of Awesome LLM/VLM Inference Papers with codes, such as FlashAttention, PagedAttention, Parallelism, etc. 🎉🎉

3,024 (+13)

gpl-3.0

openvinotoolkit/openvino

OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference

7,481 (+13)

apache-2.0

mistralai/mistral-inference

Official inference library for Mistral models

9,812 (+8)

apache-2.0

predibase/lorax

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

2,266 (+7)

apache-2.0

NVIDIA/Star-Attention

Efficient LLM Inference over Long Sequences

321 (+6)

apache-2.0

Last 3 days (relative gain)

jose-mdz/groq-chrome-ext

Chrome extension that interacts with content using Groq

29 (+164%)

PySpur-Dev/PySpur

Graph-Based Editor for LLM Workflows

768 (+129%)

apache-2.0

Infini-AI-Lab/MagicPIG

MagicPIG: LSH Sampling for Efficient LLM Generation

159 (+16%)

apache-2.0

andrewkchan/yalm

Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O

144 (+15%)

felladrin/awesome-ai-web-search

A list of software that allows searching the web with the assistance of AI.

586 (+6%)

cc0-1.0

mallahyari/twosetai

All the code and materials related to our YouTube channel (TwoSetAI)

64 (+5%)

harleyszhang/lite_llama

The llama model inference lite framework by triton.

65 (+3%)

epuerta9/kitchenai

Open Source LLMOps tool for AI teams

83 (+2%)

apache-2.0

harleyszhang/llm_counts

llm theoretical performance analysis tools and support params, flops, memory and latency analysis.

44 (+2%)

flashinfer-ai/flashinfer

FlashInfer: Kernel Library for LLM Serving

1,597 (+2%)

apache-2.0

NVIDIA/Star-Attention

Efficient LLM Inference over Long Sequences

321 (+2%)

apache-2.0

little51/llm-dev

《大模型项目实战：多领域智能应用开发》配套资源

57 (+2%)

mit

harleyszhang/llm_note

LLM notes, including model inference, transformer model structure, and llm framework code analysis notes

289 (+2%)

katanemo/archgw

1,021 (+2%)

apache-2.0

zjhellofss/KuiperLLama

校招、秋招、春招、实习好项目，带你从零动手实现支持LLama2/3和Qwen2.5的大模型推理框架。

249 (+2%)

dreadnode/burpference

A research project to add some brrrrrr to Burp

68 (+1%)

character-ai/prompt-poet

Streamlines and simplifies prompt design for both developers and non-technical users with a low code approach.

972 (+1%)

mit

felladrin/MiniSearch

Minimalist web-searching platform with an AI assistant that runs directly from your browser. Uses WebLLM, Wllama and SearXNG. Demo: https://felladrin-minisearch.hf.space

308 (+1%)

apache-2.0

codelion/optillm

Optimizing inference proxy for LLMs

1,774 (+0.9%)

apache-2.0

Hoshinonyaruko/Gensokyo-llm

开源的智能体项目支持6种聊天平台 Onebotv11一对多连接流式信息 agent 对话keyboard气泡生成支持10+大模型接口(持续更新) 具有将多种大模型接口转化为带有上下文的通用格式的能力.

112 (+0.9%)

gpl-3.0

Last week (new repositories)

no newly created repositories trending in the last week

Last week (absolute gain)

PySpur-Dev/PySpur

Graph-Based Editor for LLM Workflows

768 (+706)

apache-2.0

microsoft/autogen

A programming framework for agentic AI 🤖 PyPi: autogen-agentchat Discord: https://aka.ms/autogen-discord Office Hour: https://aka.ms/autogen-officehour

36,146 (+302)

cc-by-4.0

liguodongiot/llm-action

本项目旨在分享大模型相关技术原理以及实战经验（大模型工程化、大模型应用落地）

12,037 (+280)

apache-2.0

andrewkchan/yalm

Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O

144 (+136)

nomic-ai/gpt4all

GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.

71,262 (+121)

mit

InternLM/lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

4,887 (+54)

apache-2.0

flashinfer-ai/flashinfer

FlashInfer: Kernel Library for LLM Serving

1,597 (+48)

apache-2.0

felladrin/awesome-ai-web-search

A list of software that allows searching the web with the assistance of AI.

586 (+48)

cc0-1.0

bentoml/OpenLLM

Run any open-source LLMs, such as Llama, Mistral, as OpenAI compatible API endpoint in the cloud.

10,266 (+43)

apache-2.0

katanemo/archgw

1,021 (+42)

apache-2.0

Lightning-AI/litgpt

20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.

10,954 (+39)

apache-2.0

openvinotoolkit/openvino

OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference

7,481 (+35)

apache-2.0

Infini-AI-Lab/MagicPIG

MagicPIG: LSH Sampling for Efficient LLM Generation

159 (+29)

apache-2.0

DefTruth/Awesome-LLM-Inference

📖A curated list of Awesome LLM/VLM Inference Papers with codes, such as FlashAttention, PagedAttention, Parallelism, etc. 🎉🎉

3,024 (+28)

gpl-3.0

codelion/optillm

Optimizing inference proxy for LLMs

1,774 (+27)

apache-2.0

NVIDIA/GenerativeAIExamples

Generative AI reference workflows optimized for accelerated infrastructure and microservice architecture.

2,563 (+24)

apache-2.0

harleyszhang/llm_note

LLM notes, including model inference, transformer model structure, and llm framework code analysis notes

289 (+22)

character-ai/prompt-poet

Streamlines and simplifies prompt design for both developers and non-technical users with a low code approach.

972 (+20)

mit

harleyszhang/lite_llama

The llama model inference lite framework by triton.

65 (+19)

jose-mdz/groq-chrome-ext

Chrome extension that interacts with content using Groq

29 (+18)

Last week (relative gain)

andrewkchan/yalm

Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O

144 (+1,700%)

PySpur-Dev/PySpur

Graph-Based Editor for LLM Workflows

768 (+1,139%)

apache-2.0

jose-mdz/groq-chrome-ext

Chrome extension that interacts with content using Groq

29 (+164%)

harleyszhang/lite_llama

The llama model inference lite framework by triton.

65 (+41%)

mallahyari/twosetai

All the code and materials related to our YouTube channel (TwoSetAI)

64 (+39%)

dreadnode/burpference

A research project to add some brrrrrr to Burp

68 (+28%)

Infini-AI-Lab/MagicPIG

MagicPIG: LSH Sampling for Efficient LLM Generation

159 (+22%)

apache-2.0

felladrin/awesome-ai-web-search

A list of software that allows searching the web with the assistance of AI.

586 (+9%)

cc0-1.0

harleyszhang/llm_note

LLM notes, including model inference, transformer model structure, and llm framework code analysis notes

289 (+8%)

NVIDIA/Star-Attention

Efficient LLM Inference over Long Sequences

321 (+5%)

apache-2.0

harleyszhang/llm_counts

llm theoretical performance analysis tools and support params, flops, memory and latency analysis.

44 (+5%)

katanemo/archgw

1,021 (+4%)

apache-2.0

romsto/Speculative-Decoding

Implementation of the paper Fast Inference from Transformers via Speculative Decoding, Leviathan et al. 2023.

31 (+3%)

mit

flashinfer-ai/flashinfer

FlashInfer: Kernel Library for LLM Serving

1,597 (+3%)

apache-2.0

waltonfuture/Diff-eRank

[NeurIPS 2024] A Novel Rank-Based Metric for Evaluating Large Language Models

34 (+3%)

apache-2.0

InftyAI/llmaz

☸️ Easy, advanced inference platform for large language models on Kubernetes. 🌟 Star to support our work!

35 (+3%)

apache-2.0

zjhellofss/KuiperLLama

校招、秋招、春招、实习好项目，带你从零动手实现支持LLama2/3和Qwen2.5的大模型推理框架。

249 (+3%)

foldl/chatllm.cpp

Pure C++ implementation of several models for real-time chatting on your computer (CPU)

460 (+2%)

mit

liguodongiot/llm-action

本项目旨在分享大模型相关技术原理以及实战经验（大模型工程化、大模型应用落地）

12,037 (+2%)

apache-2.0

character-ai/prompt-poet

Streamlines and simplifies prompt design for both developers and non-technical users with a low code approach.

972 (+2%)

mit

Last month (new repositories)

dreadnode/burpference

A research project to add some brrrrrr to Burp

jose-mdz/groq-chrome-ext

Chrome extension that interacts with content using Groq

Last month (absolute gain)

microsoft/autogen

A programming framework for agentic AI 🤖 PyPi: autogen-agentchat Discord: https://aka.ms/autogen-discord Office Hour: https://aka.ms/autogen-officehour

36,146 (+1,532)

cc-by-4.0

liguodongiot/llm-action

本项目旨在分享大模型相关技术原理以及实战经验（大模型工程化、大模型应用落地）

12,037 (+1,158)

apache-2.0

PySpur-Dev/PySpur

Graph-Based Editor for LLM Workflows

768 (+756)

apache-2.0

nomic-ai/gpt4all

GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.

71,262 (+495)

mit

NVIDIA/Star-Attention

Efficient LLM Inference over Long Sequences

321 (+319)

apache-2.0

katanemo/archgw

1,021 (+254)

apache-2.0

harleyszhang/llm_note

LLM notes, including model inference, transformer model structure, and llm framework code analysis notes

289 (+241)

InternLM/lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

4,887 (+199)

apache-2.0

Lightning-AI/litgpt

20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.

10,954 (+199)

apache-2.0

codelion/optillm

Optimizing inference proxy for LLMs

1,774 (+181)

apache-2.0

bentoml/OpenLLM

Run any open-source LLMs, such as Llama, Mistral, as OpenAI compatible API endpoint in the cloud.

10,266 (+179)

apache-2.0

DefTruth/Awesome-LLM-Inference

📖A curated list of Awesome LLM/VLM Inference Papers with codes, such as FlashAttention, PagedAttention, Parallelism, etc. 🎉🎉

3,024 (+164)

gpl-3.0

felladrin/awesome-ai-web-search

A list of software that allows searching the web with the assistance of AI.

586 (+164)

cc0-1.0

openvinotoolkit/openvino

OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference

7,481 (+163)

apache-2.0

andrewkchan/yalm

Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O

144 (+143)

flashinfer-ai/flashinfer

FlashInfer: Kernel Library for LLM Serving

1,597 (+136)

apache-2.0

beam-cloud/beta9

Run serverless GPU workloads with fast cold starts on bare-metal servers, anywhere in the world

612 (+120)

agpl-3.0

NVIDIA/GenerativeAIExamples

Generative AI reference workflows optimized for accelerated infrastructure and microservice architecture.

2,563 (+101)

apache-2.0

Infini-AI-Lab/MagicPIG

MagicPIG: LSH Sampling for Efficient LLM Generation

159 (+99)

apache-2.0

kserve/kserve

Standardized Serverless ML Inference Platform on Kubernetes

3,721 (+69)

apache-2.0

Last month (relative gain)

PySpur-Dev/PySpur

Graph-Based Editor for LLM Workflows

768 (+6,300%)

apache-2.0

harleyszhang/lite_llama

The llama model inference lite framework by triton.

65 (+550%)

harleyszhang/llm_note

LLM notes, including model inference, transformer model structure, and llm framework code analysis notes

289 (+502%)

dreadnode/burpference

A research project to add some brrrrrr to Burp

68 (+183%)

Infini-AI-Lab/MagicPIG

MagicPIG: LSH Sampling for Efficient LLM Generation

159 (+165%)

apache-2.0

jose-mdz/groq-chrome-ext

Chrome extension that interacts with content using Groq

29 (+164%)

little51/llm-dev

《大模型项目实战：多领域智能应用开发》配套资源

57 (+84%)

mit

epuerta9/kitchenai

Open Source LLMOps tool for AI teams

83 (+60%)

apache-2.0

mallahyari/twosetai

All the code and materials related to our YouTube channel (TwoSetAI)

64 (+45%)

felladrin/awesome-ai-web-search

A list of software that allows searching the web with the assistance of AI.

586 (+39%)

cc0-1.0

katanemo/archgw

1,021 (+33%)

apache-2.0

waltonfuture/Diff-eRank

[NeurIPS 2024] A Novel Rank-Based Metric for Evaluating Large Language Models

34 (+31%)

apache-2.0

AlbanPerli/Noema-Declarative-AI

A declarative way to control LLMs.

57 (+30%)

apache-2.0

harleyszhang/llm_counts

llm theoretical performance analysis tools and support params, flops, memory and latency analysis.

44 (+29%)

beam-cloud/beta9

Run serverless GPU workloads with fast cold starts on bare-metal servers, anywhere in the world

612 (+24%)

agpl-3.0

genai-impact/ecologits

🌱 EcoLogits tracks the energy consumption and environmental footprint of using generative AI models through APIs.

104 (+17%)

mpl-2.0

InftyAI/llmaz

☸️ Easy, advanced inference platform for large language models on Kubernetes. 🌟 Star to support our work!

35 (+17%)

apache-2.0

interestingLSY/swiftLLM

A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of vLLM).

122 (+16%)

apache-2.0

romsto/Speculative-Decoding

Implementation of the paper Fast Inference from Transformers via Speculative Decoding, Leviathan et al. 2023.

31 (+15%)

mit

piresramon/gpt-4-enem

Code and data to evaluate LLMs on the ENEM, the main standardized Brazilian university admission exams.

42 (+14%)

mit

Last 12-months (new repositories)

databricks/dbrx

Code examples and resources for DBRX, a large language model developed by Databricks

2,515

codelion/optillm

Optimizing inference proxy for LLMs

1,774

apache-2.0

katanemo/archgw

1,021

apache-2.0

character-ai/prompt-poet

Streamlines and simplifies prompt design for both developers and non-technical users with a low code approach.

972

mit

PySpur-Dev/PySpur

Graph-Based Editor for LLM Workflows

768

apache-2.0

mukel/llama3.java

Practical Llama 3 inference in Java

613

mit

felladrin/awesome-ai-web-search

A list of software that allows searching the web with the assistance of AI.

586

cc0-1.0

hpcaitech/SwiftInfer

Efficient AI Inference & Serving

461

apache-2.0

feifeibear/long-context-attention

USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference

388

apache-2.0

preternatural-explore/mlx-swift-chat

A multi-platform SwiftUI frontend for running local LLMs with Apple's MLX framework.

364

mit

NVIDIA/Star-Attention

Efficient LLM Inference over Long Sequences

321

apache-2.0

harleyszhang/llm_note

LLM notes, including model inference, transformer model structure, and llm framework code analysis notes

289

AI-Hypercomputer/JetStream

JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).

256

apache-2.0

zjhellofss/KuiperLLama

校招、秋招、春招、实习好项目，带你从零动手实现支持LLama2/3和Qwen2.5的大模型推理框架。

249

inferflow/inferflow

Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).

236

mit

Infini-AI-Lab/TriForce

[COLM 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding

236

armbues/SiLLM

SiLLM simplifies the process of training and running Large Language Models (LLMs) on Apple Silicon by leveraging the MLX framework.

234

mit

bytedance/ABQ-LLM

An acceleration library that supports arbitrary bit-width combinatorial quantization operations

232

apache-2.0

Psycoy/MixEval

The official evaluation suite and dynamic data release for MixEval.

229

arc53/llm-price-compass

This project collects GPU benchmarks from various cloud providers and compares them to fixed per token costs. Use our tool for efficient LLM GPU selections and cost-effective AI models. LLM provider p...

214

mit

Last 12-months (absolute gain)

microsoft/autogen

A programming framework for agentic AI 🤖 PyPi: autogen-agentchat Discord: https://aka.ms/autogen-discord Office Hour: https://aka.ms/autogen-officehour

36,146 (+17,390)

cc-by-4.0

nomic-ai/gpt4all

GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.

71,262 (+14,369)

mit

liguodongiot/llm-action

本项目旨在分享大模型相关技术原理以及实战经验（大模型工程化、大模型应用落地）

12,037 (+9,608)

apache-2.0

Lightning-AI/litgpt

20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.

10,954 (+6,533)

apache-2.0

mistralai/mistral-inference

Official inference library for Mistral models

9,812 (+4,050)

apache-2.0

SJTU-IPADS/PowerInfer

High-speed Large Language Model Serving on PCs with Consumer-grade GPUs

8,022 (+3,491)

mit

InternLM/lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

4,887 (+3,280)

apache-2.0

bentoml/OpenLLM

Run any open-source LLMs, such as Llama, Mistral, as OpenAI compatible API endpoint in the cloud.

10,266 (+2,938)

apache-2.0

DefTruth/Awesome-LLM-Inference

📖A curated list of Awesome LLM/VLM Inference Papers with codes, such as FlashAttention, PagedAttention, Parallelism, etc. 🎉🎉

3,024 (+2,476)

gpl-3.0

NVIDIA/GenerativeAIExamples

Generative AI reference workflows optimized for accelerated infrastructure and microservice architecture.

2,563 (+2,407)

apache-2.0

openvinotoolkit/openvino

OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference

7,481 (+2,342)

apache-2.0

databricks/dbrx

Code examples and resources for DBRX, a large language model developed by Databricks

2,515 (+2,213)

microsoft/aici

AICI: Prompts as (Wasm) Programs

1,972 (+1,971)

mit

codelion/optillm

Optimizing inference proxy for LLMs

1,774 (+1,772)

apache-2.0

predibase/lorax

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

2,266 (+1,720)

apache-2.0

b4rtaz/distributed-llama

Tensor parallelism is all you need. Run LLMs on an AI cluster at home using any device. Distribute the workload, divide RAM usage, and increase inference speed.

1,576 (+1,519)

mit

flashinfer-ai/flashinfer

FlashInfer: Kernel Library for LLM Serving

1,597 (+1,461)

apache-2.0

superduper-io/superduper

Superduper: Build end-to-end AI applications and agent workflows on your existing data infrastructure and preferred tools - without migrating your data.

4,874 (+1,433)

apache-2.0

bentoml/BentoML

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

7,234 (+1,227)

apache-2.0

FasterDecoding/Medusa

Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads

2,348 (+1,078)

apache-2.0

Last 12-months (relative gain)

PySpur-Dev/PySpur

Graph-Based Editor for LLM Workflows

768 (+19,100%)

apache-2.0

character-ai/prompt-poet

Streamlines and simplifies prompt design for both developers and non-technical users with a low code approach.

972 (+10,700%)

mit

zjhellofss/KuiperLLama

校招、秋招、春招、实习好项目，带你从零动手实现支持LLama2/3和Qwen2.5的大模型推理框架。

249 (+6,125%)

bytedance/ABQ-LLM

An acceleration library that supports arbitrary bit-width combinatorial quantization operations

232 (+5,700%)

apache-2.0

feifeibear/long-context-attention

USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference

388 (+5,443%)

apache-2.0

Picovoice/picollm

On-device LLM Inference Powered by X-Bit Quantization

196 (+4,800%)

apache-2.0

b4rtaz/distributed-llama

Tensor parallelism is all you need. Run LLMs on an AI cluster at home using any device. Distribute the workload, divide RAM usage, and increase inference speed.

1,576 (+2,665%)

mit

FasterDecoding/TEAL

No description

102 (+2,450%)

mit

EulerSearch/embedding_studio

Embedding Studio is a framework which allows you transform your Vector Database into a feature-rich Search Engine.

378 (+2,420%)

apache-2.0

felladrin/MiniSearch

Minimalist web-searching platform with an AI assistant that runs directly from your browser. Uses WebLLM, Wllama and SearXNG. Demo: https://felladrin-minisearch.hf.space

308 (+1,953%)

apache-2.0

Infini-AI-Lab/MagicPIG

MagicPIG: LSH Sampling for Efficient LLM Generation

159 (+1,888%)

apache-2.0

intel/neural-speed

An innovative library for efficient LLM inference via low-bit quantization

349 (+1,839%)

apache-2.0

OpenCSGs/llm-inference

llm-inference is a platform for publishing and managing llm inference, providing a wide range of out-of-the-box features for model deployment, such as UI, RESTful API, auto-scaling, computing resource...

72 (+1,700%)

apache-2.0

rohan-paul/LLM-FineTuning-Large-Language-Models

LLM (Large Language Model) FineTuning

478 (+1,670%)

NVIDIA/GenerativeAIExamples

Generative AI reference workflows optimized for accelerated infrastructure and microservice architecture.

2,563 (+1,543%)

apache-2.0

galeselee/Awesome_LLM_System-PaperList

Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of papers on accelerating LLMs, currently focusing mainly on inferenc...

184 (+1,214%)