Statistics for topic llm-serving
RepositoryStats tracks 579,129 Github repositories, of these 31 are tagged with the llm-serving topic. The most common primary language for repositories using this topic is Python (16).
Stargazers over time for topic llm-serving
Most starred repositories for topic llm-serving (view more)
Trending repositories for topic llm-serving (view more)
A high-throughput and memory-efficient inference and serving engine for LLMs
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
SGLang is a fast serving framework for large language models and vision language models.
SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
A throughput-oriented high-performance serving framework for LLMs
SGLang is a fast serving framework for large language models and vision language models.
A high-throughput and memory-efficient inference and serving engine for LLMs
SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
A high-throughput and memory-efficient inference and serving engine for LLMs
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
SGLang is a fast serving framework for large language models and vision language models.
SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
A throughput-oriented high-performance serving framework for LLMs
SGLang is a fast serving framework for large language models and vision language models.
A high-throughput and memory-efficient inference and serving engine for LLMs
SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
A high-throughput and memory-efficient inference and serving engine for LLMs
SGLang is a fast serving framework for large language models and vision language models.
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
Run any open-source LLMs, such as Llama, Gemma, as OpenAI compatible API endpoint in the cloud.
A throughput-oriented high-performance serving framework for LLMs
SGLang is a fast serving framework for large language models and vision language models.
A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems
A high-throughput and memory-efficient inference and serving engine for LLMs
SGLang is a fast serving framework for large language models and vision language models.
RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.
Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of papers on accelerating LLMs, currently focusing mainly on inferenc...
A high-throughput and memory-efficient inference and serving engine for LLMs
SGLang is a fast serving framework for large language models and vision language models.
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
Superduper: build end-2-end AI applications and templates using your existing data infrastructure and tools of choice
SGLang is a fast serving framework for large language models and vision language models.
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of papers on accelerating LLMs, currently focusing mainly on inferenc...
Multi-node production GenAI stack. Run the best of open source AI easily on your own servers. Easily add knowledge from documents and scrape websites. Create your own AI by fine-tuning open source mod...
A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems