Statistics for topic llm-inference
RepositoryStats tracks 641,731 Github repositories, of these 207 are tagged with the llm-inference topic. The most common primary language for repositories using this topic is Python (112). Other languages include: C++ (20), Jupyter Notebook (18), TypeScript (12)
Stargazers over time for topic llm-inference
Most starred repositories for topic llm-inference (view more)
Trending repositories for topic llm-inference (view more)
prima.cpp: Speeding up 70B-scale LLM inference on low-resource everyday home clusters
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
Eko (Eko Keeps Operating) - Build Production-ready Agentic Workflow with Natural Language - eko.fellou.ai
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
prima.cpp: Speeding up 70B-scale LLM inference on low-resource everyday home clusters
Open source framework to create full featured AI Agents in PHP - powered by Inspector.dev
It is a comprehensive resource hub compiling all LLM papers accepted at the International Conference on Learning Representations (ICLR) in 2024.
prima.cpp: Speeding up 70B-scale LLM inference on low-resource everyday home clusters
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
The AI-native proxy server for agents. Arch handles the pesky heavy lifting in building agentic apps - routing prompts to agents or specific tools, clarifying input, unifying access and observability ...
prima.cpp: Speeding up 70B-scale LLM inference on low-resource everyday home clusters
Open source framework to create full featured AI Agents in PHP - powered by Inspector.dev
Ollama alternative for Rockchip NPU: An efficient solution for running AI and Deep learning models on Rockchip devices with optimized NPU support ( rkllm )
Telegram bot for different language models. Supports system prompts and images
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
prima.cpp: Speeding up 70B-scale LLM inference on low-resource everyday home clusters
The AI-native proxy server for agents. Arch handles the pesky heavy lifting in building agentic apps - routing prompts to agents or specific tools, clarifying input, unifying access and observability ...
prima.cpp: Speeding up 70B-scale LLM inference on low-resource everyday home clusters
Open source framework to create full featured AI Agents in PHP - powered by Inspector.dev
DEEPPOWERS is an MCP (Model Context Protocol) inference acceleration engine that enhances MCP workflows and powers MCP collaboration. It supports mainstream LLMs, including DeepSeek, GPT, Gemini, and ...
Eko (Eko Keeps Operating) - Build Production-ready Agentic Workflow with Natural Language - eko.fellou.ai
The AI-native proxy server for agents. Arch handles the pesky heavy lifting in building agentic apps - routing prompts to agents or specific tools, clarifying input, unifying access and observability ...
Streamlines and simplifies prompt design for both developers and non-technical users with a low code approach.
Run local LLMs like llama, deepseek-distill, kokoro and more inside your browser
GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.
Streamlines and simplifies prompt design for both developers and non-technical users with a low code approach.
DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including CUDA, x86 and ARMv9.