Statistics for topic llm-serving
RepositoryStats tracks 595,856 Github repositories, of these 32 are tagged with the llm-serving topic. The most common primary language for repositories using this topic is Python (16).
Stargazers over time for topic llm-serving
Most starred repositories for topic llm-serving (view more)
Trending repositories for topic llm-serving (view more)
A high-throughput and memory-efficient inference and serving engine for LLMs
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
SGLang is a fast serving framework for large language models and vision language models.
Run any open-source LLMs, such as Llama, Mistral, as OpenAI compatible API endpoint in the cloud.
A highly optimized LLM inference acceleration engine for Llama and its variants.
A throughput-oriented high-performance serving framework for LLMs
This is suite of the hands-on training materials that shows how to scale CV, NLP, time-series forecasting workloads with Ray.
A high-throughput and memory-efficient inference and serving engine for LLMs
A high-throughput and memory-efficient inference and serving engine for LLMs
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
SGLang is a fast serving framework for large language models and vision language models.
A highly optimized LLM inference acceleration engine for Llama and its variants.
A highly optimized LLM inference acceleration engine for Llama and its variants.
A throughput-oriented high-performance serving framework for LLMs
Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of papers on accelerating LLMs, currently focusing mainly on inferenc...
SGLang is a fast serving framework for large language models and vision language models.
A highly optimized LLM inference acceleration engine for Llama and its variants.
A high-throughput and memory-efficient inference and serving engine for LLMs
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
SGLang is a fast serving framework for large language models and vision language models.
A highly optimized LLM inference acceleration engine for Llama and its variants.
A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of vLLM).
SGLang is a fast serving framework for large language models and vision language models.
Multi-node production GenAI stack. Run the best of open source AI easily on your own servers. Easily add knowledge from documents and scrape websites. Create your own AI by fine-tuning open source mod...
A high-throughput and memory-efficient inference and serving engine for LLMs
SGLang is a fast serving framework for large language models and vision language models.
RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.
A highly optimized LLM inference acceleration engine for Llama and its variants.
A high-throughput and memory-efficient inference and serving engine for LLMs
SGLang is a fast serving framework for large language models and vision language models.
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
Run any open-source LLMs, such as Llama, Mistral, as OpenAI compatible API endpoint in the cloud.
SGLang is a fast serving framework for large language models and vision language models.
Multi-node production GenAI stack. Run the best of open source AI easily on your own servers. Easily add knowledge from documents and scrape websites. Create your own AI by fine-tuning open source mod...
A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems
Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of papers on accelerating LLMs, currently focusing mainly on inferenc...