37 results found Sort:

2.3k
15.8k
unknown
203
Scripts for fine-tuning Meta Llama with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting a ...
Created 2023-07-17
1,812 commits to main branch, last one 16 days ago
481
5.8k
apache-2.0
43
Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any...
Created 2023-06-14
1,039 commits to main branch, last one 2 days ago
429
4.9k
apache-2.0
23
Use PEFT or Full-parameter to finetune 400+ LLMs (Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, ...) or 150+ MLLMs (Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL...
Created 2023-08-01
1,327 commits to main branch, last one 11 hours ago
397
4.1k
gpl-3.0
53
Data processing with ML, LLM and Vision LLM
Created 2022-01-08
535 commits to main branch, last one 12 days ago
331
3.5k
apache-2.0
28
An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT)
Created 2023-07-30
1,038 commits to main branch, last one a day ago
📖A curated list of Awesome LLM/VLM Inference Papers with codes, such as FlashAttention, PagedAttention, Parallelism, etc. 🎉🎉
Created 2023-08-27
434 commits to main branch, last one a day ago
🔒 Enterprise-grade API gateway that helps you monitor and impose cost or rate limits per API key. Get fine-grained access control and monitoring per user, application, or environment. Supports OpenAI...
Created 2023-07-18
737 commits to main branch, last one 3 days ago
Evaluate your LLM's response with Prometheus and GPT4 💯
Created 2024-04-18
204 commits to main branch, last one about a month ago
50
620
apache-2.0
12
AI Inference Operator for Kubernetes. The easiest way to serve ML models in production. Supports LLMs, embeddings, and speech-to-text.
Created 2023-10-21
236 commits to main branch, last one a day ago
Make Discord your LLM frontend ● Supports any OpenAI compatible API (Ollama, LM Studio, vLLM, OpenRouter, xAI, Mistral, Groq and more)
Created 2023-05-08
359 commits to main branch, last one 14 hours ago
Low latency JSON generation using LLMs ⚡️
Created 2023-11-15
76 commits to main branch, last one 10 months ago
43
374
apache-2.0
10
[EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit".
Created 2024-03-06
428 commits to main branch, last one a day ago
30
330
unknown
5
LLM notes, including model inference, transformer model structure, and llm framework code analysis notes.
Created 2024-09-18
188 commits to main branch, last one 9 hours ago
The goal of RamaLama is to make working with AI boring.
Created 2024-07-24
1,003 commits to main branch, last one a day ago
53
307
mit
7
A large-scale simulation framework for LLM inference
Created 2023-11-02
23 commits to main branch, last one about a month ago
The RunPod worker template for serving our large language model endpoints. Powered by vLLM.
Created 2023-07-03
332 commits to main branch, last one 5 days ago
37
241
unknown
5
TopicGPT: A Prompt-Based Framework for Topic Modeling (NAACL'24)
Created 2023-11-02
20 commits to main branch, last one about a month ago
Documentation on setting up an LLM server on Debian from scratch, using Ollama/vLLM, Open WebUI, OpenedAI Speech, and ComfyUI.
Created 2024-03-26
13 commits to main branch, last one 2 months ago
Setup and run a local LLM and Chatbot using consumer grade hardware.
Created 2023-09-12
351 commits to main branch, last one a day ago
13
170
apache-2.0
11
【深度学习模型部署框架】支持tf/torch/trt/trtllm/vllm以及更多nn框架,支持dynamic batching、streaming模式,支持python/c++双语言,可限制,可拓展,高性能。帮助用户快速地将模型部署到线上,并通过http/rpc接口方式提供服务。
Created 2024-07-04
59 commits to master branch, last one 11 days ago
6
145
other
7
Booster - open accelerator for LLM models. Better inference and debugging for AI hackers
Created 2023-05-04
491 commits to main branch, last one 4 months ago
13
144
apache-2.0
4
gpt_server是一个用于生产级部署LLMs或Embedding的开源框架。
Created 2023-12-16
283 commits to main branch, last one 2 days ago
12
130
other
4
Framework agnostic computer vision inference. Run 1000+ models by changing only one line of code. Supports models from transformers, timm, ultralytics, vllm, ollama and your custom model.
Created 2024-10-10
284 commits to main branch, last one about a month ago
4
123
other
2
Official implementation of "DoRA: Weight-Decomposed Low-Rank Adaptation"
This repository has been archived (exclude archived)
Created 2024-03-05
22 commits to main branch, last one 8 months ago
2
87
apache-2.0
3
Fine-tuning and serving LLMs on any cloud
This repository has been archived (exclude archived)
Created 2023-07-30
44 commits to main branch, last one about a year ago
Fully-featured, beautiful web interface for vLLM - built with NextJS.
Created 2024-03-05
129 commits to main branch, last one 5 months ago
16
73
apache-2.0
12
llm-inference is a platform for publishing and managing llm inference, providing a wide range of out-of-the-box features for model deployment, such as UI, RESTful API, auto-scaling, computing resource...
Created 2024-02-28
116 commits to main branch, last one 7 months ago
14
71
unknown
6
一个轻量级、支持全链路且易于二次开发的大模型应用项目 基于 Dify 、Ollama&Vllm、Sanic 和 Text2SQL 📊 等技术构建的一站式大模型应用开发项目,采用 Vue3、TypeScript 和 Vite 5 打造现代UI。它支持通过 ECharts 📈 实现基于大模型的数据图形化问答,具备处理 CSV 文件 📂 表格问答的能力。同时,能方便对接第三方开源 RAG 系统 ...
Created 2024-11-15
133 commits to master branch, last one 3 days ago
A simple service that integrates vLLM with Ray Serve for fast and scalable LLM serving.
Created 2023-10-28
52 commits to master branch, last one about a year ago
An endpoint server for efficiently serving quantized open-source LLMs for code.
Created 2023-09-25
3 commits to main branch, last one about a year ago