Statistics for topic inference
RepositoryStats tracks 584,796 Github repositories, of these 302 are tagged with the inference topic. The most common primary language for repositories using this topic is Python (129). Other languages include: C++ (58), Jupyter Notebook (28), Rust (11), TypeScript (11)
Stargazers over time for topic inference
Most starred repositories for topic inference (view more)
Trending repositories for topic inference (view more)
A high-throughput and memory-efficient inference and serving engine for LLMs
SGLang is a fast serving framework for large language models and vision language models.
Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any...
📒A small curated list of Awesome Diffusion Inference Papers with codes.
whisper-cpp-serve Real-time speech recognition and c+ of OpenAI's Whisper model in C/C++
A high-throughput and memory-efficient inference and serving engine for LLMs
SGLang is a fast serving framework for large language models and vision language models.
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Extensible generative AI platform on Kubernetes with OpenAI-compatible APIs.
The Qualcomm® AI Hub apps are a collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.) and ready to deploy on Qualcomm® devices.
whisper-cpp-serve Real-time speech recognition and c+ of OpenAI's Whisper model in C/C++
A high-throughput and memory-efficient inference and serving engine for LLMs
SGLang is a fast serving framework for large language models and vision language models.
Extensible generative AI platform on Kubernetes with OpenAI-compatible APIs.
Instantly calculate the maximum size of quantized language models that can fit in your available RAM, helping you optimize your models for inference.
The Qualcomm® AI Hub apps are a collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.) and ready to deploy on Qualcomm® devices.
SGLang is a fast serving framework for large language models and vision language models.
High performance AI inference stack. Built for production. @ziglang / @openxla / MLIR / @bazelbuild
A high-throughput and memory-efficient inference and serving engine for LLMs
SGLang is a fast serving framework for large language models and vision language models.
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
SGLang is a fast serving framework for large language models and vision language models.
PyTorch native quantization and sparsity for training and inference