Search Results - RepositoryStats

2.5k

17.1k

mit

184

Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started with Inference, Fine-Tuning, RAG. We also show you how to solve end to end problems using Llama model f...

ai llm vllm llama llama2 python pytorch langchain finetuning machine-learning

Created 2023-07-17

2,090 commits to main branch, last one 15 hours ago

inference xorbitsai

646

7.6k

apache-2.0

53

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any...

Created 2023-06-14

1,192 commits to main branch, last one a day ago

OpenRLHF OpenRLHF

630

6.4k

apache-2.0

38

An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray (PPO & GRPO & REINFORCE++ & LoRA & vLLM & RFT)

vllm raylib openai-o1 transformers large-language-models reinforcement-learning proximal-policy-optimization reinforcement-learning-from-human-feedback

Created 2023-07-30

1,258 commits to main branch, last one 12 hours ago

sparrow katanaml

452

4.5k

gpl-3.0

56

Data processing with ML, LLM and Vision LLM

gpt llm rag vllm computer-vision machinelearning nlp-machine-learning huggingface-transformers

Created 2022-01-08

686 commits to main branch, last one 5 days ago

Awesome-LLM-Inference xlite-dev

275

3.9k

gpl-3.0

122

📚A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, FlashAttention, PagedAttention, MLA, Parallelism etc.

mla vllm deepseek flash-mla minimax-01 awesome-llm deepseek-r1 deepseek-v3 tensorrt-llm llm-inference flash-attention paged-attention flash-attention-3

Created 2023-08-27

471 commits to main branch, last one 7 days ago

ramalama containers

166

1.6k

mit

30

The goal of RamaLama is to make working with AI boring.

ai llm vllm podman llamacpp containers inference-server

Created 2024-07-24

1,999 commits to main branch, last one a day ago

BricksLLM bricks-cloud

72

1.0k

mit

8

🔒 Enterprise-grade API gateway that helps you monitor and impose cost or rate limits per API key. Get fine-grained access control and monitoring per user, application, or environment. Supports OpenAI...

ai api gpt llm vllm azure docker golang openai privacy rest-api security anthropic postgresql open-source self-hosted ycombinator generative-ai artificial-intelligence

Created 2023-07-18

739 commits to main branch, last one 3 months ago

prometheus-eval prometheus-eval

55

915

apache-2.0

3

Evaluate your LLM's response with Prometheus and GPT4 💯

llm gpt4 vllm llmops python litellm evaluation llm-as-a-judge llm-as-evaluator

Created 2024-04-18

209 commits to main branch, last one about a month ago

kubeai substratusai

85

910

apache-2.0

12

AI Inference Operator for Kubernetes. The easiest way to serve ML models in production. Supports VLMs, LLMs, embeddings, and speech-to-text.

ai k8s llm vllm ollama whisper autoscaler kubernetes openai-api vllm-operator faster-whisper ollama-operator inference-operator

Created 2023-10-21

305 commits to main branch, last one 19 hours ago

llm_note harleyszhang

74

732

unknown

8

LLM notes, including model inference, transformer model structure, and llm framework code analysis notes.

llm vllm kv-cache llm-inference triton-kernels cuda-programming transformer-models

Created 2024-09-18

324 commits to main branch, last one 21 hours ago

llama-swap mostlygeek

32

570

mit

7

Model swapping for llama.cpp (or any local OpenAPI compatible server)

vllm llama golang openai llamacpp localllm localllama openai-api

Created 2024-10-04

173 commits to main branch, last one 15 hours ago

llmcord jakobdylanc

106

529

mit

7

Make Discord your LLM frontend ● Supports any OpenAI compatible API (Ollama, LM Studio, vLLM, OpenRouter, xAI, Mistral, Groq and more)

Created 2023-05-08

398 commits to main branch, last one 19 hours ago

vllm-ascend vllm-project

105

522

apache-2.0

11

Community maintained hardware plugin for vLLM on Ascend

llm vllm mlops ascend llmops inference llm-serving transformer model-serving

Created 2025-01-29

190 commits to main branch, last one 9 hours ago

sanic-web apconw

88

499

unknown

10

一个轻量级、支持全链路且易于二次开发的大模型应用项目(Large Model Data Assistant) 支持DeepSeek/Qwen2.5等大模型基于 Dify 、Ollama&Vllm、Sanic 和 Text2SQL 📊 等技术构建的一站式大模型应用开发项目，采用 Vue3、TypeScript 和 Vite 5 打造现代UI。它支持通过 ECharts 📈 实现基于大模型的数据...

ai llm rag chat dify qwen vllm vue3 sanic ollama python bigdata chatgpt echarts text2sql deepseek-r1 large-model-data-assistant

Created 2024-11-15

226 commits to master branch, last one 7 days ago

GPTQModel ModelCloud

73

492

apache-2.0

5

Production ready LLM model compression/quantization toolkit with hw accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.

gptq peft vllm sglang optimum quantization transformers

Created 2024-06-17

2,155 commits to main branch, last one 3 days ago

llmc ModelTC

54

461

apache-2.0

9

[EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit".

Created 2024-03-06

484 commits to main branch, last one 2 days ago

llm-server-docs varunvasudeva1

36

432

mit

6

Documentation on setting up an LLM server on Debian from scratch, using Ollama/vLLM, Open WebUI, OpenedAI Speech/Kokoro FastAPI, and ComfyUI.

llm vllm linux debian ollama server comfyui open-webui openedai-speech

Created 2024-03-26

16 commits to main branch, last one 25 days ago

super-json-mode varunshenoy

14

398

unknown

3

Low latency JSON generation using LLMs ⚡️

llm vllm openai huggingface-transformers

Created 2023-11-15

76 commits to main branch, last one about a year ago

vidur microsoft

65

365

mit

8

A large-scale simulation framework for LLM inference

llm vllm inference simulation transformer

Created 2023-11-02

23 commits to main branch, last one 5 months ago

Fast-Spark-TTS HuiResearch

45

321

unknown

4

基于SparkTTS、OrpheusTTS等模型，提供高质量中文语音合成与声音克隆服务。

vllm sglang megatts3 sparktts spark-tts orpheus-tts llamacpp-python

Created 2025-03-13

101 commits to master branch, last one a day ago

worker-vllm runpod-workers

148

307

mit

8

The RunPod worker template for serving our large language model endpoints. Powered by vLLM.

llm vllm runpod language-model

Created 2023-07-03

389 commits to main branch, last one 3 days ago

topicGPT chtmp223

47

287

unknown

6

TopicGPT: A Prompt-Based Framework for Topic Modeling (NAACL'24)

llm nlp vllm openai python topic-modeling

Created 2023-11-02

25 commits to main branch, last one about a month ago

TinyLLM jasonacox

28

241

mit

9

Setup and run a local LLM and Chatbot using consumer grade hardware.

llm rag vllm openai chatbot llama-cpp-python large-language-models artificial-intelligence retrieval-augmented-generation

Created 2023-09-12

406 commits to main branch, last one 4 days ago

Namo-R1 lucasjinreal

17

185

unknown

10

A CPU Realtime VLM in 500M. Surpassed Moondream2 and SmolVLM. Training from scratch with ease.

llm mllm vllm vllms smolvlm moondream

Created 2025-02-21

9 commits to main branch, last one 2 days ago

gpt_server shell-nlp

16

169

apache-2.0

4

gpt_server是一个用于生产级部署LLMs、Embedding、Reranker、ASR和TTS的开源框架。

asr gpt llm tts vllm llama openai rerank sglang fastchat infinity lmdeploy embedding text-moderation function-calling prompt-injection

Created 2023-12-16

346 commits to main branch, last one a day ago

RoboBrain FlagOpen

8

169

apache-2.0

7

[CVPR 2025] RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete. Official Repository.

vllm robotics embodied-ai

Created 2025-03-27

15 commits to main branch, last one 13 days ago

grps NetEase-Media

13

157

apache-2.0

9

Deep Learning Deployment Framework: Supports tf/torch/trt/trtllm/vllm and other NN frameworks. Support dynamic batching, and streaming modes. It is dual-language compatible with Python and C++, offeri...

vllm torch serving tensorrt tensorflow tensorrt-llm dynamic-batching triton-inference-server

Created 2024-07-04

62 commits to master branch, last one about a month ago

booster gotzmann

7

154

other

7

Booster - open accelerator for LLM models. Better inference and debugging for AI hackers

gpt llm ggml vllm llama ollama openai chatgpt exllama llamacpp llama-cpp oobabooga

Created 2023-05-04

491 commits to main branch, last one 8 months ago

llmaz InftyAI

24

137

apache-2.0

7

☸️ Easy, advanced inference platform for large language models on Kubernetes. 🌟 Star to support our work!

llm vllm ollama sglang llamacpp inference kubernetes modelscope huggingface inference-platform text-generation-inference

Created 2023-11-20

419 commits to main branch, last one a day ago

nextjs-vllm-ui yoziru

18

127

mit

2

Fully-featured, beautiful web interface for vLLM - built with NextJS.

ai ui vllm webui llm-ui nextjs vllm-ui llm-webui openai-api typescript self-hosted tailwindcss

Created 2024-03-05

133 commits to main branch, last one about a month ago