24 results found Sort:
- Filter by Primary Language:
- Python (10)
- Jupyter Notebook (4)
- C++ (2)
- Go (2)
- Shell (2)
- JavaScript (1)
- TypeScript (1)
- +
Scripts for fine-tuning Meta Llama with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting a ...
Created
2023-07-17
1,526 commits to main branch, last one 2 days ago
Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any...
Created
2023-06-14
944 commits to main branch, last one a day ago
📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
Created
2023-08-27
390 commits to main branch, last one a day ago
An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & Mixtral)
Created
2023-07-30
903 commits to main branch, last one 3 days ago
🔒 Enterprise-grade API gateway that helps you monitor and impose cost or rate limits per API key. Get fine-grained access control and monitoring per user, application, or environment. Supports OpenAI...
Created
2023-07-18
698 commits to main branch, last one 12 days ago
Evaluate your LLM's response with Prometheus and GPT4 💯
Created
2024-04-18
203 commits to main branch, last one 19 days ago
Low latency JSON generation using LLMs ⚡️
Created
2023-11-15
76 commits to main branch, last one 6 months ago
Private Open AI on Kubernetes
Created
2023-10-21
179 commits to main branch, last one a day ago
Effortlessly run LLM backends, APIs, frontends, and services with one command.
Created
2024-07-27
265 commits to main branch, last one a day ago
A large-scale simulation framework for LLM inference
Created
2023-11-02
20 commits to main branch, last one about a month ago
This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit".
Created
2024-03-06
255 commits to main branch, last one a day ago
The RunPod worker template for serving our large language model endpoints. Powered by vLLM.
Created
2023-07-03
296 commits to main branch, last one 12 days ago
The goal of ramalama is to make working with AI boring.
Created
2024-07-24
479 commits to main branch, last one a day ago
Setup and run a local LLM and Chatbot using consumer grade hardware.
Created
2023-09-12
326 commits to main branch, last one 2 days ago
【深度学习模型部署框架】支持tf/torch/trt/trtllm/vllm以及更多nn框架,支持dynamic batching、streaming模式,支持python/c++双语言,可限制,可拓展,高性能。帮助用户快速地将模型部署到线上,并通过http/rpc接口方式提供服务。
Created
2024-07-04
53 commits to master branch, last one 23 days ago
Booster - open accelerator for LLM models. Better inference and debugging for AI hackers
Created
2023-05-04
491 commits to main branch, last one about a month ago
Official implementation of "DoRA: Weight-Decomposed Low-Rank Adaptation"
This repository has been archived
(exclude archived)
Created
2024-03-05
22 commits to main branch, last one 5 months ago
Fine-tuning and serving LLMs on any cloud
This repository has been archived
(exclude archived)
Created
2023-07-30
44 commits to main branch, last one 10 months ago
llm-inference is a platform for publishing and managing llm inference, providing a wide range of out-of-the-box features for model deployment, such as UI, RESTful API, auto-scaling, computing resource...
Created
2024-02-28
116 commits to main branch, last one 4 months ago
An endpoint server for efficiently serving quantized open-source LLMs for code.
Created
2023-09-25
3 commits to main branch, last one 11 months ago
A simple service that integrates vLLM with Ray Serve for fast and scalable LLM serving.
Created
2023-10-28
52 commits to master branch, last one 9 months ago
Fully-featured, beautiful web interface for vLLM - built with NextJS.
Created
2024-03-05
129 commits to main branch, last one 2 months ago
演示 vllm 对中文大语言模型的神奇效果
Created
2023-07-08
6 commits to master branch, last one 10 months ago
Dockerized LLM inference server with constrained output (JSON mode), built on top of vLLM and outlines. Faster, cheaper and without rate limits. Compare the quality and latency to your current LLM API...
Created
2024-02-13
20 commits to main branch, last one 7 months ago