Trending repositories for topic llm-serving

Last 3 days (new repositories)

no newly created repositories trending in the last 3 days

Last 3 days (absolute gain)

vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

32,867 (+152)

apache-2.0

sgl-project/sglang

SGLang is a fast serving framework for large language models and vision language models.

6,894 (+83)

apache-2.0

liguodongiot/llm-action

本项目旨在分享大模型相关技术原理以及实战经验（大模型工程化、大模型应用落地）

12,297 (+65)

apache-2.0

zhihu/ZhiLight

A highly optimized LLM inference acceleration engine for Llama and its variants.

510 (+52)

apache-2.0

ray-project/ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

34,677 (+35)

apache-2.0

skypilot-org/skypilot

SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.

6,970 (+21)

apache-2.0

bentoml/OpenLLM

Run any open-source LLMs, such as Llama, Mistral, as OpenAI compatible API endpoint in the cloud.

10,300 (+6)

apache-2.0

bentoml/BentoML

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

7,250 (+5)

apache-2.0

rohan-paul/LLM-FineTuning-Large-Language-Models

LLM (Large Language Model) FineTuning

483 (+3)

galeselee/Awesome_LLM_System-PaperList

Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of papers on accelerating LLMs, currently focusing mainly on inferenc...

194 (+2)

alibaba/rtp-llm

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

567 (+2)

apache-2.0

predibase/lorax

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

2,274 (+2)

apache-2.0

EmbeddedLLM/embeddedllm

EmbeddedLLM: API server for Embedded Device Deployment. Currently support CUDA/OpenVINO/IpexLLM/DirectML/CPU

28 (+1)

efeslab/Nanoflow

A throughput-oriented high-performance serving framework for LLMs

672 (+1)

apache-2.0

superduper-io/superduper

Superduper: Build end-to-end AI applications and agent workflows on your existing data infrastructure and preferred tools - without migrating your data.

4,883 (+1)

apache-2.0

Last 3 days (relative gain)

zhihu/ZhiLight

A highly optimized LLM inference acceleration engine for Llama and its variants.

510 (+11%)

apache-2.0

EmbeddedLLM/embeddedllm

EmbeddedLLM: API server for Embedded Device Deployment. Currently support CUDA/OpenVINO/IpexLLM/DirectML/CPU

28 (+4%)

sgl-project/sglang

SGLang is a fast serving framework for large language models and vision language models.

6,894 (+1%)

apache-2.0

galeselee/Awesome_LLM_System-PaperList

194 (+1%)

rohan-paul/LLM-FineTuning-Large-Language-Models

LLM (Large Language Model) FineTuning

483 (+0.6%)

liguodongiot/llm-action

本项目旨在分享大模型相关技术原理以及实战经验（大模型工程化、大模型应用落地）

12,297 (+0.5%)

apache-2.0

vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

32,867 (+0.5%)

apache-2.0

alibaba/rtp-llm

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

567 (+0.4%)

apache-2.0

skypilot-org/skypilot

SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.

6,970 (+0.3%)

apache-2.0

efeslab/Nanoflow

A throughput-oriented high-performance serving framework for LLMs

672 (+0.1%)

apache-2.0

ray-project/ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

34,677 (+0.1%)

apache-2.0

predibase/lorax

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

2,274 (+0.1%)

apache-2.0

bentoml/BentoML

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

7,250 (+0.1%)

apache-2.0

bentoml/OpenLLM

Run any open-source LLMs, such as Llama, Mistral, as OpenAI compatible API endpoint in the cloud.

10,300 (+0.1%)

apache-2.0

superduper-io/superduper

Superduper: Build end-to-end AI applications and agent workflows on your existing data infrastructure and preferred tools - without migrating your data.

4,883 (+0.0%)

apache-2.0

Last week (new repositories)

no newly created repositories trending in the last week

Last week (absolute gain)

vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

32,867 (+508)

apache-2.0

sgl-project/sglang

SGLang is a fast serving framework for large language models and vision language models.

6,894 (+260)

apache-2.0

liguodongiot/llm-action

本项目旨在分享大模型相关技术原理以及实战经验（大模型工程化、大模型应用落地）

12,297 (+194)

apache-2.0

zhihu/ZhiLight

A highly optimized LLM inference acceleration engine for Llama and its variants.

510 (+99)

apache-2.0

ray-project/ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

34,677 (+99)

apache-2.0

skypilot-org/skypilot

SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.

6,970 (+43)

apache-2.0

bentoml/OpenLLM

Run any open-source LLMs, such as Llama, Mistral, as OpenAI compatible API endpoint in the cloud.

10,300 (+31)

apache-2.0

bentoml/BentoML

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

7,250 (+9)

apache-2.0

alibaba/rtp-llm

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

567 (+7)

apache-2.0

predibase/lorax

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

2,274 (+5)

apache-2.0

superduper-io/superduper

Superduper: Build end-to-end AI applications and agent workflows on your existing data infrastructure and preferred tools - without migrating your data.

4,883 (+5)

apache-2.0

galeselee/Awesome_LLM_System-PaperList

194 (+3)

rohan-paul/LLM-FineTuning-Large-Language-Models

LLM (Large Language Model) FineTuning

483 (+3)

EmbeddedLLM/embeddedllm

EmbeddedLLM: API server for Embedded Device Deployment. Currently support CUDA/OpenVINO/IpexLLM/DirectML/CPU

28 (+2)

ray-project/ray-educational-materials

This is suite of the hands-on training materials that shows how to scale CV, NLP, time-series forecasting workloads with Ray.

358 (+2)

apache-2.0

efeslab/Nanoflow

A throughput-oriented high-performance serving framework for LLMs

672 (+2)

apache-2.0

interestingLSY/swiftLLM

A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of vLLM).

126 (+1)

apache-2.0

ray-project/ray-llm

RayLLM - LLMs on Ray

1,242 (+1)

apache-2.0

microsoft/aici

AICI: Prompts as (Wasm) Programs

1,974 (+1)

mit

Last week (relative gain)

zhihu/ZhiLight

A highly optimized LLM inference acceleration engine for Llama and its variants.

510 (+24%)

apache-2.0

EmbeddedLLM/embeddedllm

EmbeddedLLM: API server for Embedded Device Deployment. Currently support CUDA/OpenVINO/IpexLLM/DirectML/CPU

28 (+8%)

sgl-project/sglang

SGLang is a fast serving framework for large language models and vision language models.

6,894 (+4%)

apache-2.0

liguodongiot/llm-action

本项目旨在分享大模型相关技术原理以及实战经验（大模型工程化、大模型应用落地）

12,297 (+2%)

apache-2.0

galeselee/Awesome_LLM_System-PaperList

194 (+2%)

vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

32,867 (+2%)

apache-2.0

alibaba/rtp-llm

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

567 (+1%)

apache-2.0

interestingLSY/swiftLLM

A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of vLLM).

126 (+0.8%)

apache-2.0

rohan-paul/LLM-FineTuning-Large-Language-Models

LLM (Large Language Model) FineTuning

483 (+0.6%)

skypilot-org/skypilot

SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.

6,970 (+0.6%)

apache-2.0

ray-project/ray-educational-materials

This is suite of the hands-on training materials that shows how to scale CV, NLP, time-series forecasting workloads with Ray.

358 (+0.6%)

apache-2.0

bentoml/OpenLLM

Run any open-source LLMs, such as Llama, Mistral, as OpenAI compatible API endpoint in the cloud.

10,300 (+0.3%)

apache-2.0

efeslab/Nanoflow

A throughput-oriented high-performance serving framework for LLMs

672 (+0.3%)

apache-2.0

ray-project/ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

34,677 (+0.3%)

apache-2.0

predibase/lorax

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

2,274 (+0.2%)

apache-2.0

bentoml/BentoML

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

7,250 (+0.1%)

apache-2.0

superduper-io/superduper

Superduper: Build end-to-end AI applications and agent workflows on your existing data infrastructure and preferred tools - without migrating your data.

4,883 (+0.1%)

apache-2.0

ray-project/ray-llm

RayLLM - LLMs on Ray

1,242 (+0.1%)

apache-2.0

microsoft/aici

AICI: Prompts as (Wasm) Programs

1,974 (+0.1%)

mit

Last month (new repositories)

zhihu/ZhiLight

A highly optimized LLM inference acceleration engine for Llama and its variants.

510

apache-2.0

Last month (absolute gain)

vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

32,867 (+1,796)

apache-2.0

liguodongiot/llm-action

本项目旨在分享大模型相关技术原理以及实战经验（大模型工程化、大模型应用落地）

12,297 (+1,091)

apache-2.0

sgl-project/sglang

SGLang is a fast serving framework for large language models and vision language models.

6,894 (+608)

apache-2.0

zhihu/ZhiLight

A highly optimized LLM inference acceleration engine for Llama and its variants.

510 (+507)

apache-2.0

ray-project/ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

34,677 (+447)

apache-2.0

bentoml/OpenLLM

Run any open-source LLMs, such as Llama, Mistral, as OpenAI compatible API endpoint in the cloud.

10,300 (+162)

apache-2.0

skypilot-org/skypilot

SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.

6,970 (+130)

apache-2.0

bentoml/BentoML

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

7,250 (+56)

apache-2.0

predibase/lorax

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

2,274 (+52)

apache-2.0

superduper-io/superduper

Superduper: Build end-to-end AI applications and agent workflows on your existing data infrastructure and preferred tools - without migrating your data.

4,883 (+49)

apache-2.0

efeslab/Nanoflow

A throughput-oriented high-performance serving framework for LLMs

672 (+23)

apache-2.0

alibaba/rtp-llm

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

567 (+17)

apache-2.0

helixml/helix

Multi-node production GenAI stack. Run the best of open source AI easily on your own servers. Easily add knowledge from documents and scrape websites. Create your own AI by fine-tuning open source mod...

374 (+17)

microsoft/aici

AICI: Prompts as (Wasm) Programs

1,974 (+15)

mit

interestingLSY/swiftLLM

A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of vLLM).

126 (+12)

apache-2.0

galeselee/Awesome_LLM_System-PaperList

194 (+12)

ray-project/ray-educational-materials

This is suite of the hands-on training materials that shows how to scale CV, NLP, time-series forecasting workloads with Ray.

358 (+12)

apache-2.0

rohan-paul/LLM-FineTuning-Large-Language-Models

LLM (Large Language Model) FineTuning

483 (+10)

mosecorg/mosec

A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine

807 (+9)

apache-2.0

EmbeddedLLM/embeddedllm

EmbeddedLLM: API server for Embedded Device Deployment. Currently support CUDA/OpenVINO/IpexLLM/DirectML/CPU

28 (+6)

Last month (relative gain)

EmbeddedLLM/embeddedllm

EmbeddedLLM: API server for Embedded Device Deployment. Currently support CUDA/OpenVINO/IpexLLM/DirectML/CPU

28 (+27%)

interestingLSY/swiftLLM

A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of vLLM).

126 (+11%)

apache-2.0

liguodongiot/llm-action

本项目旨在分享大模型相关技术原理以及实战经验（大模型工程化、大模型应用落地）

12,297 (+10%)

apache-2.0

sgl-project/sglang

SGLang is a fast serving framework for large language models and vision language models.

6,894 (+10%)

apache-2.0

azminewasi/Awesome-LLMs-ICLR-24

It is a comprehensive resource hub compiling all LLM papers accepted at the International Conference on Learning Representations (ICLR) in 2024.

44 (+7%)

mit

galeselee/Awesome_LLM_System-PaperList

194 (+7%)

vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

32,867 (+6%)

apache-2.0

helixml/helix

374 (+5%)

efeslab/Nanoflow

A throughput-oriented high-performance serving framework for LLMs

672 (+4%)

apache-2.0

ray-project/ray-educational-materials

This is suite of the hands-on training materials that shows how to scale CV, NLP, time-series forecasting workloads with Ray.

358 (+3%)

apache-2.0

alibaba/rtp-llm

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

567 (+3%)

apache-2.0

AntonioGr7/pratical-llms

A collection of hand on notebook for LLMs practitioner

40 (+3%)

predibase/lorax

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

2,274 (+2%)

apache-2.0

HPMLL/BurstGPT

A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems

141 (+2%)

cc-by-4.0

rohan-paul/LLM-FineTuning-Large-Language-Models

LLM (Large Language Model) FineTuning

483 (+2%)

skypilot-org/skypilot

SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.

6,970 (+2%)

apache-2.0

bentoml/OpenLLM

Run any open-source LLMs, such as Llama, Mistral, as OpenAI compatible API endpoint in the cloud.

10,300 (+2%)

apache-2.0

ray-project/ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

34,677 (+1%)

apache-2.0

mosecorg/mosec

A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine

807 (+1%)

apache-2.0

superduper-io/superduper

Superduper: Build end-to-end AI applications and agent workflows on your existing data infrastructure and preferred tools - without migrating your data.

4,883 (+1%)

apache-2.0

Last 12-months (new repositories)

sgl-project/sglang

SGLang is a fast serving framework for large language models and vision language models.

6,894

apache-2.0

efeslab/Nanoflow

A throughput-oriented high-performance serving framework for LLMs

672

apache-2.0

zhihu/ZhiLight

A highly optimized LLM inference acceleration engine for Llama and its variants.

510

apache-2.0

hpcaitech/SwiftInfer

Efficient AI Inference & Serving

461

apache-2.0

HPMLL/BurstGPT

A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems

141

cc-by-4.0

interestingLSY/swiftLLM

A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of vLLM).

126

apache-2.0

azminewasi/Awesome-LLMs-ICLR-24

It is a comprehensive resource hub compiling all LLM papers accepted at the International Conference on Learning Representations (ICLR) in 2024.

mit

AntonioGr7/pratical-llms

A collection of hand on notebook for LLMs practitioner

EmbeddedLLM/embeddedllm

EmbeddedLLM: API server for Embedded Device Deployment. Currently support CUDA/OpenVINO/IpexLLM/DirectML/CPU

Last 12-months (absolute gain)

vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

32,867 (+20,393)

apache-2.0

liguodongiot/llm-action

本项目旨在分享大模型相关技术原理以及实战经验（大模型工程化、大模型应用落地）

12,297 (+9,722)

apache-2.0

sgl-project/sglang

SGLang is a fast serving framework for large language models and vision language models.

6,894 (+6,871)

apache-2.0

ray-project/ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

34,677 (+5,417)

apache-2.0

bentoml/OpenLLM

Run any open-source LLMs, such as Llama, Mistral, as OpenAI compatible API endpoint in the cloud.

10,300 (+2,899)

apache-2.0

skypilot-org/skypilot

SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.

6,970 (+2,220)

apache-2.0

microsoft/aici

AICI: Prompts as (Wasm) Programs

1,974 (+1,973)

mit

predibase/lorax

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

2,274 (+1,676)

apache-2.0

superduper-io/superduper

Superduper: Build end-to-end AI applications and agent workflows on your existing data infrastructure and preferred tools - without migrating your data.

4,883 (+1,229)

apache-2.0

bentoml/BentoML

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

7,250 (+1,191)

apache-2.0

efeslab/Nanoflow

A throughput-oriented high-performance serving framework for LLMs

672 (+670)

apache-2.0

alibaba/rtp-llm

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

567 (+545)

apache-2.0

zhihu/ZhiLight

A highly optimized LLM inference acceleration engine for Llama and its variants.

510 (+507)

apache-2.0

rohan-paul/LLM-FineTuning-Large-Language-Models

LLM (Large Language Model) FineTuning

483 (+397)

helixml/helix

374 (+332)

ray-project/ray-llm

RayLLM - LLMs on Ray

1,242 (+323)

apache-2.0

hpcaitech/SwiftInfer

Efficient AI Inference & Serving

461 (+281)

apache-2.0

galeselee/Awesome_LLM_System-PaperList

194 (+179)

mosecorg/mosec

A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine

807 (+153)

apache-2.0

ray-project/ray-educational-materials

This is suite of the hands-on training materials that shows how to scale CV, NLP, time-series forecasting workloads with Ray.

358 (+137)

apache-2.0

Last 12-months (relative gain)

sgl-project/sglang

SGLang is a fast serving framework for large language models and vision language models.

6,894 (+29,874%)

apache-2.0

alibaba/rtp-llm

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

567 (+2,477%)

apache-2.0

HPMLL/BurstGPT

A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems

141 (+2,250%)

cc-by-4.0

galeselee/Awesome_LLM_System-PaperList

194 (+1,193%)

helixml/helix

374 (+790%)

asprenger/ray_vllm_inference

A simple service that integrates vLLM with Ray Serve for fast and scalable LLM serving.

55 (+686%)

apache-2.0

rohan-paul/LLM-FineTuning-Large-Language-Models

LLM (Large Language Model) FineTuning

483 (+462%)

liguodongiot/llm-action

本项目旨在分享大模型相关技术原理以及实战经验（大模型工程化、大模型应用落地）

12,297 (+378%)

apache-2.0

AntonioGr7/pratical-llms

A collection of hand on notebook for LLMs practitioner

40 (+344%)

predibase/lorax

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

2,274 (+280%)

apache-2.0

mani-kantap/llm-inference-solutions

A collection of all available inference solutions for the LLMs

75 (+226%)

mit

vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

32,867 (+163%)

apache-2.0

hpcaitech/SwiftInfer

Efficient AI Inference & Serving

461 (+156%)

apache-2.0

torchpipe/torchpipe

Serving Inside Pytorch

148 (+85%)

apache-2.0

friendliai/friendli-client

Friendli: the fastest serving engine for generative AI

42 (+83%)

apache-2.0

ray-project/ray-educational-materials

This is suite of the hands-on training materials that shows how to scale CV, NLP, time-series forecasting workloads with Ray.

358 (+62%)

apache-2.0

skypilot-org/skypilot

SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.

6,970 (+47%)

apache-2.0

bentoml/OpenLLM

Run any open-source LLMs, such as Llama, Mistral, as OpenAI compatible API endpoint in the cloud.

10,300 (+39%)

apache-2.0

slai-labs/get-beam

Run GPU inference and training jobs on serverless infrastructure that scales with you.

100 (+37%)

ray-project/ray-llm

RayLLM - LLMs on Ray

1,242 (+35%)

apache-2.0