Statistics for topic llm-serving
RepositoryStats tracks 584,797 Github repositories, of these 31 are tagged with the llm-serving topic. The most common primary language for repositories using this topic is Python (16).
Stargazers over time for topic llm-serving
Most starred repositories for topic llm-serving (view more)
Trending repositories for topic llm-serving (view more)
A high-throughput and memory-efficient inference and serving engine for LLMs
SGLang is a fast serving framework for large language models and vision language models.
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
Run any open-source LLMs, such as Llama, Mistral, as OpenAI compatible API endpoint in the cloud.
A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems
SGLang is a fast serving framework for large language models and vision language models.
A high-throughput and memory-efficient inference and serving engine for LLMs
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
A high-throughput and memory-efficient inference and serving engine for LLMs
SGLang is a fast serving framework for large language models and vision language models.
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
Run any open-source LLMs, such as Llama, Mistral, as OpenAI compatible API endpoint in the cloud.
A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems
SGLang is a fast serving framework for large language models and vision language models.
Multi-node production GenAI stack. Run the best of open source AI easily on your own servers. Easily add knowledge from documents and scrape websites. Create your own AI by fine-tuning open source mod...
Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of papers on accelerating LLMs, currently focusing mainly on inferenc...
A high-throughput and memory-efficient inference and serving engine for LLMs
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
SGLang is a fast serving framework for large language models and vision language models.
Run any open-source LLMs, such as Llama, Mistral, as OpenAI compatible API endpoint in the cloud.
A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of vLLM).
A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems
Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of papers on accelerating LLMs, currently focusing mainly on inferenc...
SGLang is a fast serving framework for large language models and vision language models.
SGLang is a fast serving framework for large language models and vision language models.
RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.
A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems
A high-throughput and memory-efficient inference and serving engine for LLMs
SGLang is a fast serving framework for large language models and vision language models.
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
Superduper: Build end-to-end AI applications and agent workflows on your existing data infrastructure and preferred tools - without migrating your data.
SGLang is a fast serving framework for large language models and vision language models.
LLM (Large Language Model) FineTuning
Multi-node production GenAI stack. Run the best of open source AI easily on your own servers. Easily add knowledge from documents and scrape websites. Create your own AI by fine-tuning open source mod...
Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of papers on accelerating LLMs, currently focusing mainly on inferenc...
A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems