51 results found Sort:

Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started with Inference, Fine-Tuning, RAG. We also show you how to solve end to end problems using Llama model f...
Created 2023-07-17
2,090 commits to main branch, last one 15 hours ago
646
7.6k
apache-2.0
53
Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any...
Created 2023-06-14
1,192 commits to main branch, last one a day ago
630
6.4k
apache-2.0
38
An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray (PPO & GRPO & REINFORCE++ & LoRA & vLLM & RFT)
Created 2023-07-30
1,258 commits to main branch, last one 12 hours ago
452
4.5k
gpl-3.0
56
Data processing with ML, LLM and Vision LLM
Created 2022-01-08
686 commits to main branch, last one 5 days ago
📚A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, FlashAttention, PagedAttention, MLA, Parallelism etc.
Created 2023-08-27
471 commits to main branch, last one 7 days ago
166
1.6k
mit
30
The goal of RamaLama is to make working with AI boring.
Created 2024-07-24
1,999 commits to main branch, last one a day ago
🔒 Enterprise-grade API gateway that helps you monitor and impose cost or rate limits per API key. Get fine-grained access control and monitoring per user, application, or environment. Supports OpenAI...
Created 2023-07-18
739 commits to main branch, last one 3 months ago
Evaluate your LLM's response with Prometheus and GPT4 💯
Created 2024-04-18
209 commits to main branch, last one about a month ago
85
910
apache-2.0
12
AI Inference Operator for Kubernetes. The easiest way to serve ML models in production. Supports VLMs, LLMs, embeddings, and speech-to-text.
Created 2023-10-21
305 commits to main branch, last one 19 hours ago
74
732
unknown
8
LLM notes, including model inference, transformer model structure, and llm framework code analysis notes.
Created 2024-09-18
324 commits to main branch, last one 21 hours ago
Model swapping for llama.cpp (or any local OpenAPI compatible server)
Created 2024-10-04
173 commits to main branch, last one 15 hours ago
Make Discord your LLM frontend ● Supports any OpenAI compatible API (Ollama, LM Studio, vLLM, OpenRouter, xAI, Mistral, Groq and more)
Created 2023-05-08
398 commits to main branch, last one 19 hours ago
105
522
apache-2.0
11
Community maintained hardware plugin for vLLM on Ascend
Created 2025-01-29
190 commits to main branch, last one 9 hours ago
88
499
unknown
10
一个轻量级、支持全链路且易于二次开发的大模型应用项目(Large Model Data Assistant) 支持DeepSeek/Qwen2.5等大模型 基于 Dify 、Ollama&Vllm、Sanic 和 Text2SQL 📊 等技术构建的一站式大模型应用开发项目,采用 Vue3、TypeScript 和 Vite 5 打造现代UI。它支持通过 ECharts 📈 实现基于大模型的数据...
Created 2024-11-15
226 commits to master branch, last one 7 days ago
73
492
apache-2.0
5
Production ready LLM model compression/quantization toolkit with hw accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.
Created 2024-06-17
2,155 commits to main branch, last one 3 days ago
54
461
apache-2.0
9
[EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit".
Created 2024-03-06
484 commits to main branch, last one 2 days ago
Documentation on setting up an LLM server on Debian from scratch, using Ollama/vLLM, Open WebUI, OpenedAI Speech/Kokoro FastAPI, and ComfyUI.
Created 2024-03-26
16 commits to main branch, last one 25 days ago
Low latency JSON generation using LLMs ⚡️
Created 2023-11-15
76 commits to main branch, last one about a year ago
65
365
mit
8
A large-scale simulation framework for LLM inference
Created 2023-11-02
23 commits to main branch, last one 5 months ago
基于SparkTTS、OrpheusTTS等模型,提供高质量中文语音合成与声音克隆服务。
Created 2025-03-13
101 commits to master branch, last one a day ago
The RunPod worker template for serving our large language model endpoints. Powered by vLLM.
Created 2023-07-03
389 commits to main branch, last one 3 days ago
47
287
unknown
6
TopicGPT: A Prompt-Based Framework for Topic Modeling (NAACL'24)
Created 2023-11-02
25 commits to main branch, last one about a month ago
Setup and run a local LLM and Chatbot using consumer grade hardware.
Created 2023-09-12
406 commits to main branch, last one 4 days ago
17
185
unknown
10
A CPU Realtime VLM in 500M. Surpassed Moondream2 and SmolVLM. Training from scratch with ease.
Created 2025-02-21
9 commits to main branch, last one 2 days ago
16
169
apache-2.0
4
gpt_server是一个用于生产级部署LLMs、Embedding、Reranker、ASR和TTS的开源框架。
Created 2023-12-16
346 commits to main branch, last one a day ago
8
169
apache-2.0
7
[CVPR 2025] RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete. Official Repository.
Created 2025-03-27
15 commits to main branch, last one 13 days ago
13
157
apache-2.0
9
Deep Learning Deployment Framework: Supports tf/torch/trt/trtllm/vllm and other NN frameworks. Support dynamic batching, and streaming modes. It is dual-language compatible with Python and C++, offeri...
Created 2024-07-04
62 commits to master branch, last one about a month ago
7
154
other
7
Booster - open accelerator for LLM models. Better inference and debugging for AI hackers
Created 2023-05-04
491 commits to main branch, last one 8 months ago
24
137
apache-2.0
7
☸️ Easy, advanced inference platform for large language models on Kubernetes. 🌟 Star to support our work!
Created 2023-11-20
419 commits to main branch, last one a day ago
Fully-featured, beautiful web interface for vLLM - built with NextJS.
Created 2024-03-05
133 commits to main branch, last one about a month ago