24 results found Sort:

1.7k
11.8k
unknown
95
Scripts for fine-tuning Meta Llama with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting a ...
Created 2023-07-17
1,526 commits to main branch, last one 2 days ago
391
4.9k
apache-2.0
42
Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any...
Created 2023-06-14
944 commits to main branch, last one a day ago
📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
Created 2023-08-27
390 commits to main branch, last one a day ago
206
2.1k
apache-2.0
21
An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & Mixtral)
Created 2023-07-30
903 commits to main branch, last one 3 days ago
🔒 Enterprise-grade API gateway that helps you monitor and impose cost or rate limits per API key. Get fine-grained access control and monitoring per user, application, or environment. Supports OpenAI...
Created 2023-07-18
698 commits to main branch, last one 12 days ago
Evaluate your LLM's response with Prometheus and GPT4 💯
Created 2024-04-18
203 commits to main branch, last one 19 days ago
Low latency JSON generation using LLMs ⚡️
Created 2023-11-15
76 commits to main branch, last one 6 months ago
31
341
apache-2.0
9
Private Open AI on Kubernetes
Created 2023-10-21
179 commits to main branch, last one a day ago
14
311
apache-2.0
6
Effortlessly run LLM backends, APIs, frontends, and services with one command.
Created 2024-07-27
265 commits to main branch, last one a day ago
27
241
mit
7
A large-scale simulation framework for LLM inference
Created 2023-11-02
20 commits to main branch, last one about a month ago
27
234
apache-2.0
9
This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit".
Created 2024-03-06
255 commits to main branch, last one a day ago
The RunPod worker template for serving our large language model endpoints. Powered by vLLM.
Created 2023-07-03
296 commits to main branch, last one 12 days ago
The goal of ramalama is to make working with AI boring.
Created 2024-07-24
479 commits to main branch, last one a day ago
Setup and run a local LLM and Chatbot using consumer grade hardware.
Created 2023-09-12
326 commits to main branch, last one 2 days ago
13
147
apache-2.0
11
【深度学习模型部署框架】支持tf/torch/trt/trtllm/vllm以及更多nn框架,支持dynamic batching、streaming模式,支持python/c++双语言,可限制,可拓展,高性能。帮助用户快速地将模型部署到线上,并通过http/rpc接口方式提供服务。
Created 2024-07-04
53 commits to master branch, last one 23 days ago
6
137
other
7
Booster - open accelerator for LLM models. Better inference and debugging for AI hackers
Created 2023-05-04
491 commits to main branch, last one about a month ago
3
122
other
2
Official implementation of "DoRA: Weight-Decomposed Low-Rank Adaptation"
This repository has been archived (exclude archived)
Created 2024-03-05
22 commits to main branch, last one 5 months ago
2
85
apache-2.0
3
Fine-tuning and serving LLMs on any cloud
This repository has been archived (exclude archived)
Created 2023-07-30
44 commits to main branch, last one 10 months ago
17
69
apache-2.0
13
llm-inference is a platform for publishing and managing llm inference, providing a wide range of out-of-the-box features for model deployment, such as UI, RESTful API, auto-scaling, computing resource...
Created 2024-02-28
116 commits to main branch, last one 4 months ago
An endpoint server for efficiently serving quantized open-source LLMs for code.
Created 2023-09-25
3 commits to main branch, last one 11 months ago
A simple service that integrates vLLM with Ray Serve for fast and scalable LLM serving.
Created 2023-10-28
52 commits to master branch, last one 9 months ago
Fully-featured, beautiful web interface for vLLM - built with NextJS.
Created 2024-03-05
129 commits to main branch, last one 2 months ago
演示 vllm 对中文大语言模型的神奇效果
Created 2023-07-08
6 commits to master branch, last one 10 months ago
Dockerized LLM inference server with constrained output (JSON mode), built on top of vLLM and outlines. Faster, cheaper and without rate limits. Compare the quality and latency to your current LLM API...
Created 2024-02-13
20 commits to main branch, last one 7 months ago