Trending repositories for topic inference

Last 3 days (new repositories)

no newly created repositories trending in the last 3 days

Last 3 days (absolute gain)

vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

30,534 (+232)

apache-2.0

NVIDIA/kvpress

LLM KV cache compression made easy

99 (+81)

apache-2.0

sgl-project/sglang

SGLang is a fast serving framework for large language models and vision language models.

6,142 (+62)

apache-2.0

ggerganov/whisper.cpp

Port of OpenAI's Whisper model in C/C++

35,760 (+48)

mit

SYSTRAN/faster-whisper

Faster Whisper transcription with CTranslate2

12,557 (+42)

mit

microsoft/DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

35,514 (+36)

apache-2.0

xorbitsai/inference

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any...

5,437 (+35)

apache-2.0

google-ai-edge/mediapipe

Cross-platform, customizable ML solutions for live and streaming media.

27,652 (+34)

apache-2.0

triton-inference-server/server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.

8,362 (+26)

bsd-3-clause

zjhellofss/KuiperInfer

校招、秋招、春招、实习好项目！带你从零实现一个高性能的深度学习推理库，支持大模型 llama2 、Unet、Yolov5、Resnet等模型的推理。Implement a high-performance deep learning inference library step by step

2,555 (+25)

mit

Tencent/ncnn

ncnn is a high-performance neural network inference framework optimized for the mobile platform

20,505 (+21)

AutoGPTQ/AutoGPTQ

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

4,500 (+20)

mit

gvergnaud/ts-pattern

🎨 The exhaustive Pattern Matching library for TypeScript, with smart type inference.

12,461 (+18)

mit

huggingface/text-generation-inference

Large Language Model Text Generation Inference

9,124 (+17)

apache-2.0

hpcaitech/ColossalAI

Making large AI models cheaper, faster and more accessible

38,822 (+14)

apache-2.0

AgibotTech/agibot_x1_infer

The inference module for AgiBot X1.

1,220 (+13)

argmaxinc/WhisperKit

On-device Speech Recognition for Apple Silicon

3,941 (+13)

mit

openvinotoolkit/openvino

OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference

7,304 (+13)

apache-2.0

tencentmusic/cube-studio

cube studio开源云原生一站式机器学习/深度学习/大模型AI平台，支持sso登录，多租户，大数据平台对接，notebook在线开发，拖拉拽任务流pipeline编排，多机多卡分布式训练，超参搜索，推理服务VGPU，边缘计算，serverless，标注平台，自动化标注，数据集管理，大模型微调，vllm大模型推理，llmops，私有知识库，AI模型应用商店，支持模型一键开发/推理/微调，支持国...

3,694 (+13)

NVIDIA/TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

10,824 (+12)

apache-2.0

Last 3 days (relative gain)

NVIDIA/kvpress

LLM KV cache compression made easy

99 (+450%)

apache-2.0

AIEngineersDev/solo-server

Simple server to manage compound AI

25 (+9%)

francois-rozet/sda

Official implementation of Score-based Data Assimilation

45 (+2%)

mit

DefTruth/Awesome-Diffusion-Inference

📒A small curated list of Awesome Diffusion Inference Papers with codes.

96 (+2%)

gpl-3.0

litongjava/whisper-cpp-server

whisper-cpp-serve Real-time speech recognition and c+ of OpenAI's Whisper model in C/C++

48 (+2%)

mit

quic/ai-hub-apps

The Qualcomm® AI Hub apps are a collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.) and ready to deploy on Qualcomm® devices.

64 (+2%)

bsd-3-clause

AgibotTech/agibot_x1_infer

The inference module for AgiBot X1.

1,220 (+1%)

sgl-project/sglang

SGLang is a fast serving framework for large language models and vision language models.

6,142 (+1%)

apache-2.0

zjhellofss/KuiperInfer

2,555 (+1.0%)

mit

azkadev/onnx

ONNX DART LIBRARY

354 (+0.9%)

vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

30,534 (+0.8%)

apache-2.0

triton-inference-server/onnxruntime_backend

The Triton backend for the ONNX Runtime.

132 (+0.8%)

bsd-3-clause

triton-inference-server/model_analyzer

Triton Model Analyzer is a CLI tool to help with better understanding of the compute and memory requirements of the Triton Inference Server models.

433 (+0.7%)

apache-2.0

xorbitsai/inference

5,437 (+0.6%)

apache-2.0

quic/ai-hub-models

The Qualcomm® AI Hub Models are a collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.) and ready to deploy on Qualcomm® devices.

497 (+0.6%)

bsd-3-clause

zml/zml

High performance AI inference stack. Built for production. @ziglang / @openxla / MLIR / @bazelbuild

1,661 (+0.5%)

apache-2.0

triton-inference-server/model_navigator

Triton Model Navigator is an inference toolkit designed for optimizing and deploying Deep Learning models with a focus on NVIDIA GPUs.

185 (+0.5%)

apache-2.0

Telosnex/fonnx

ONNX runtime for Flutter.

221 (+0.5%)

gpl-2.0

pykeio/ort

Fast ML inference & training for Rust with ONNX Runtime

897 (+0.4%)

apache-2.0

AutoGPTQ/AutoGPTQ

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

4,500 (+0.4%)

mit

Last week (new repositories)

no newly created repositories trending in the last week

Last week (absolute gain)

vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

30,534 (+425)

apache-2.0

SYSTRAN/faster-whisper

Faster Whisper transcription with CTranslate2

12,557 (+132)

mit

ggerganov/whisper.cpp

Port of OpenAI's Whisper model in C/C++

35,760 (+127)

mit

sgl-project/sglang

SGLang is a fast serving framework for large language models and vision language models.

6,142 (+121)

apache-2.0

NVIDIA/kvpress

LLM KV cache compression made easy

99 (+93)

apache-2.0

google-ai-edge/mediapipe

Cross-platform, customizable ML solutions for live and streaming media.

27,652 (+83)

apache-2.0

microsoft/DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

35,514 (+79)

apache-2.0

huggingface/text-generation-inference

Large Language Model Text Generation Inference

9,124 (+69)

apache-2.0

xorbitsai/inference

5,437 (+62)

apache-2.0

gvergnaud/ts-pattern

🎨 The exhaustive Pattern Matching library for TypeScript, with smart type inference.

12,461 (+50)

mit

argmaxinc/WhisperKit

On-device Speech Recognition for Apple Silicon

3,941 (+46)

mit

triton-inference-server/server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.

8,362 (+44)

bsd-3-clause

Tencent/ncnn

ncnn is a high-performance neural network inference framework optimized for the mobile platform

20,505 (+44)

hpcaitech/ColossalAI

Making large AI models cheaper, faster and more accessible

38,822 (+34)

apache-2.0

tencentmusic/cube-studio

3,694 (+33)

NVIDIA/TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

10,824 (+31)

apache-2.0

AutoGPTQ/AutoGPTQ

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

4,500 (+28)

mit

openvinotoolkit/openvino

OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference

7,304 (+28)

apache-2.0

zjhellofss/KuiperInfer

2,555 (+28)

mit

AgibotTech/agibot_x1_infer

The inference module for AgiBot X1.

1,220 (+27)

Last week (relative gain)

NVIDIA/kvpress

LLM KV cache compression made easy

99 (+1,550%)

apache-2.0

AIEngineersDev/solo-server

Simple server to manage compound AI

25 (+25%)

llmariner/llmariner

Extensible generative AI platform on Kubernetes with OpenAI-compatible APIs.

35 (+9%)

apache-2.0

quic/ai-hub-apps

The Qualcomm® AI Hub apps are a collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.) and ready to deploy on Qualcomm® devices.

64 (+7%)

bsd-3-clause

litongjava/whisper-cpp-server

whisper-cpp-serve Real-time speech recognition and c+ of OpenAI's Whisper model in C/C++

48 (+4%)

mit

DefTruth/Awesome-Diffusion-Inference

📒A small curated list of Awesome Diffusion Inference Papers with codes.

96 (+3%)

gpl-3.0

RayFernando1337/LLM-Calc

Instantly calculate the maximum size of quantized language models that can fit in your available RAM, helping you optimize your models for inference.

102 (+3%)

mit

francois-rozet/sda

Official implementation of Score-based Data Assimilation

45 (+2%)

mit

AgibotTech/agibot_x1_infer

The inference module for AgiBot X1.

1,220 (+2%)

azkadev/piper

Wip From NTTS Piper Ultra Fast And Efficient Text To Speech Library For Cross Platform Work on any edge device with cpu only

95 (+2%)

sgl-project/sglang

SGLang is a fast serving framework for large language models and vision language models.

6,142 (+2%)

apache-2.0

jgravelle/pocketgroq

PocketGroq is a powerful Python library that simplifies integration with the Groq API, offering advanced features for natural language processing, web scraping, and autonomous agent capabilities. Key ...

153 (+2%)

hshatti/LightNet

Implemented in pure pascal LightNet is an artificial intelligence neural network library Inspired by Darknet and yolo library which can run most of the darknet including YOLO models nativly and self d...

53 (+2%)

mit

Koldim2001/YOLO-Patch-Based-Inference

Python library for YOLO small object detection and instance segmentation

267 (+2%)

agpl-3.0

pytorch/ao

PyTorch native quantization and sparsity for training and inference

1,587 (+1%)

bsd-3-clause

vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

30,534 (+1%)

apache-2.0

triton-inference-server/model_analyzer

Triton Model Analyzer is a CLI tool to help with better understanding of the compute and memory requirements of the Triton Inference Server models.

433 (+1%)

apache-2.0

xorbitsai/inference

5,437 (+1%)

apache-2.0

azkadev/onnx

ONNX DART LIBRARY

354 (+1%)

pykeio/ort

Fast ML inference & training for Rust with ONNX Runtime

897 (+1%)

apache-2.0

Last month (new repositories)

AgibotTech/agibot_x1_infer

The inference module for AgiBot X1.

1,220

NVIDIA/kvpress

LLM KV cache compression made easy

apache-2.0

Last month (absolute gain)

vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

30,534 (+1,652)

apache-2.0

AgibotTech/agibot_x1_infer

The inference module for AgiBot X1.

1,220 (+940)

ggerganov/whisper.cpp

Port of OpenAI's Whisper model in C/C++

35,760 (+602)

mit

SYSTRAN/faster-whisper

Faster Whisper transcription with CTranslate2

12,557 (+574)

mit

sgl-project/sglang

SGLang is a fast serving framework for large language models and vision language models.

6,142 (+460)

apache-2.0

google-ai-edge/mediapipe

Cross-platform, customizable ML solutions for live and streaming media.

27,652 (+375)

apache-2.0

microsoft/DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

35,514 (+344)

apache-2.0

xorbitsai/inference

5,437 (+301)

apache-2.0

gvergnaud/ts-pattern

🎨 The exhaustive Pattern Matching library for TypeScript, with smart type inference.

12,461 (+288)

mit

argmaxinc/WhisperKit

On-device Speech Recognition for Apple Silicon

3,941 (+202)

mit

huggingface/text-generation-inference

Large Language Model Text Generation Inference

9,124 (+190)

apache-2.0

openvinotoolkit/openvino

OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference

7,304 (+159)

apache-2.0

Tencent/ncnn

ncnn is a high-performance neural network inference framework optimized for the mobile platform

20,505 (+155)

pytorch/ao

PyTorch native quantization and sparsity for training and inference

1,587 (+141)

bsd-3-clause

triton-inference-server/server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.

8,362 (+134)

bsd-3-clause

NVIDIA/TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

10,824 (+129)

apache-2.0

superduper-io/superduper

Superduper: Build end-to-end AI applications and agent workflows on your existing data infrastructure and preferred tools - without migrating your data.

4,814 (+118)

apache-2.0

tencentmusic/cube-studio

3,694 (+112)

zml/zml

High performance AI inference stack. Built for production. @ziglang / @openxla / MLIR / @bazelbuild

1,661 (+91)

apache-2.0

hpcaitech/ColossalAI

Making large AI models cheaper, faster and more accessible

38,822 (+83)

apache-2.0

Last month (relative gain)

NVIDIA/kvpress

LLM KV cache compression made easy

99 (+1,550%)

apache-2.0

llmariner/llmariner

Extensible generative AI platform on Kubernetes with OpenAI-compatible APIs.

35 (+775%)

apache-2.0

AgibotTech/agibot_x1_infer

The inference module for AgiBot X1.

1,220 (+336%)

RayFernando1337/LLM-Calc

Instantly calculate the maximum size of quantized language models that can fit in your available RAM, helping you optimize your models for inference.

102 (+89%)

mit

quic/ai-hub-apps

The Qualcomm® AI Hub apps are a collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.) and ready to deploy on Qualcomm® devices.

64 (+73%)

bsd-3-clause

ibaiGorordo/Sapiens-Pytorch-Inference

Minimal code and examnples for inferencing Sapiens foundation human models in Pytorch

103 (+34%)

mit

Bruce-Lee-LY/flash_attention_inference

Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.

29 (+26%)

bsd-3-clause

xxxxyu/FlexNN

Code for ACM MobiCom 2024 paper "FlexNN: Efficient and Adaptive DNN Inference on Memory-Constrained Edge Devices"

40 (+25%)

AIEngineersDev/solo-server

Simple server to manage compound AI

25 (+25%)

litongjava/whisper-cpp-server

whisper-cpp-serve Real-time speech recognition and c+ of OpenAI's Whisper model in C/C++

48 (+20%)

mit

DefTruth/Awesome-Diffusion-Inference

📒A small curated list of Awesome Diffusion Inference Papers with codes.

96 (+16%)

gpl-3.0

interestingLSY/swiftLLM

A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of vLLM).

104 (+14%)

apache-2.0

jgravelle/pocketgroq

153 (+13%)

Koldim2001/YOLO-Patch-Based-Inference

Python library for YOLO small object detection and instance segmentation

267 (+12%)

agpl-3.0

johannesulf/nautilus

Neural Network-Boosted Importance Nested Sampling for Bayesian Statistics

72 (+11%)

mit

NVIDIA-ISAAC-ROS/isaac_ros_pose_estimation

Deep learned, NVIDIA-accelerated 3D object pose estimation

176 (+10%)

apache-2.0

AI-Hypercomputer/JetStream

JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).

235 (+10%)

apache-2.0

pytorch/ao

PyTorch native quantization and sparsity for training and inference

1,587 (+10%)

bsd-3-clause

azkadev/piper

Wip From NTTS Piper Ultra Fast And Efficient Text To Speech Library For Cross Platform Work on any edge device with cpu only

95 (+9%)

forestry-labs/causalToolbox

No description

39 (+8%)

gpl-3.0

Last 12-months (new repositories)

sgl-project/sglang

SGLang is a fast serving framework for large language models and vision language models.

6,142

apache-2.0

argmaxinc/WhisperKit

On-device Speech Recognition for Apple Silicon

3,941

mit

zml/zml

High performance AI inference stack. Built for production. @ziglang / @openxla / MLIR / @bazelbuild

1,661

apache-2.0

AgibotTech/agibot_x1_infer

The inference module for AgiBot X1.

1,220

efeslab/Nanoflow

A throughput-oriented high-performance serving framework for LLMs

636

apache-2.0

rnchg/Apt

AI Productivity Tool - Free and open-source, enhancing user productivity while ensuring privacy and data security. It provides efficient and convenient AI solutions, including but not limited to: buil...

547

mit

alibaba/rtp-llm

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

545

apache-2.0

quic/ai-hub-models

The Qualcomm® AI Hub Models are a collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.) and ready to deploy on Qualcomm® devices.

497

bsd-3-clause

hpcaitech/SwiftInfer

Efficient AI Inference & Serving

458

apache-2.0

HaozheLiu-ST/T-GATE

T-GATE: Temporally Gating Attention to Accelerate Diffusion Model for Free!

361

mit

superagent-ai/super-rag

Super performant RAG pipelines for AI apps. Summarization, Retrieve/Rerank and Code Interpreters in one simple API.

341

mit

Infini-AI-Lab/Sequoia

scalable and robust tree-based speculative decoding algorithm

316

Koldim2001/YOLO-Patch-Based-Inference

Python library for YOLO small object detection and instance segmentation

267

agpl-3.0

jy-yuan/KIVI

[ICML 2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache

241

mit

AI-Hypercomputer/JetStream

JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).

235

apache-2.0

Infini-AI-Lab/TriForce

[COLM 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding

230

kolinko/effort

An implementation of bucketMul LLM inference

214

mit

missingstudio/gateway

[deprecated] AI Gateway - core infrastructure stack for building production-ready AI Applications

155

apache-2.0

jgravelle/pocketgroq

153

cncf/llm-in-action

🤖 Discover how to apply your LLM app skills on Kubernetes!

141

apache-2.0

Last 12-months (absolute gain)

vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

30,534 (+20,565)

apache-2.0

ggerganov/whisper.cpp

Port of OpenAI's Whisper model in C/C++

35,760 (+10,383)

mit

SYSTRAN/faster-whisper

Faster Whisper transcription with CTranslate2

12,557 (+6,874)

mit

sgl-project/sglang

SGLang is a fast serving framework for large language models and vision language models.

6,142 (+6,119)

apache-2.0

microsoft/DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

35,514 (+5,846)

apache-2.0

superduper-io/superduper

Superduper: Build end-to-end AI applications and agent workflows on your existing data infrastructure and preferred tools - without migrating your data.

4,814 (+4,437)

apache-2.0

xorbitsai/inference

5,437 (+4,233)

apache-2.0

argmaxinc/WhisperKit

On-device Speech Recognition for Apple Silicon

3,941 (+3,867)

mit

google-ai-edge/mediapipe

Cross-platform, customizable ML solutions for live and streaming media.

27,652 (+3,685)

apache-2.0

hpcaitech/ColossalAI

Making large AI models cheaper, faster and more accessible

38,822 (+3,480)

apache-2.0

huggingface/text-generation-inference

Large Language Model Text Generation Inference

9,124 (+3,148)

apache-2.0

gvergnaud/ts-pattern

🎨 The exhaustive Pattern Matching library for TypeScript, with smart type inference.

12,461 (+2,661)

mit

NVIDIA/TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

10,824 (+2,640)

apache-2.0

openvinotoolkit/openvino

OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference

7,304 (+2,315)

apache-2.0

Tencent/ncnn

ncnn is a high-performance neural network inference framework optimized for the mobile platform

20,505 (+2,210)

tencentmusic/cube-studio

3,694 (+2,118)

triton-inference-server/server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.

8,362 (+1,953)

bsd-3-clause

microsoft/aici

AICI: Prompts as (Wasm) Programs

1,947 (+1,946)

mit

AutoGPTQ/AutoGPTQ

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

4,500 (+1,790)

mit

pytorch/ao

PyTorch native quantization and sparsity for training and inference

1,587 (+1,569)

bsd-3-clause

Last 12-months (relative gain)

sgl-project/sglang

SGLang is a fast serving framework for large language models and vision language models.

6,142 (+26,604%)

apache-2.0

pytorch/ao

PyTorch native quantization and sparsity for training and inference

1,587 (+8,717%)

bsd-3-clause

Infini-AI-Lab/Sequoia

scalable and robust tree-based speculative decoding algorithm

316 (+7,800%)

microsoft/vidur

A large-scale simulation framework for LLM inference

278 (+5,460%)

mit

argmaxinc/WhisperKit

On-device Speech Recognition for Apple Silicon

3,941 (+5,226%)

mit

jgravelle/pocketgroq

153 (+2,960%)

jy-yuan/KIVI

[ICML 2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache

241 (+2,310%)

mit

NVIDIA/kvpress

LLM KV cache compression made easy

99 (+1,550%)

apache-2.0

quic/cloud-ai-sdk

Qualcomm Cloud AI SDK (Platform and Apps) enable high performance deep learning inference on Qualcomm Cloud AI platforms delivering high throughput and low latency across Computer Vision, Object Detec...

55 (+1,275%)

superduper-io/superduper

Superduper: Build end-to-end AI applications and agent workflows on your existing data infrastructure and preferred tools - without migrating your data.

4,814 (+1,177%)

apache-2.0

Infini-AI-Lab/TriForce

[COLM 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding

230 (+858%)