Statistics for topic inference
RepositoryStats tracks 579,129 Github repositories, of these 298 are tagged with the inference topic. The most common primary language for repositories using this topic is Python (128). Other languages include: C++ (58), Jupyter Notebook (27), Rust (11), TypeScript (11)
Stargazers over time for topic inference
Most starred repositories for topic inference (view more)
Trending repositories for topic inference (view more)
A high-throughput and memory-efficient inference and serving engine for LLMs
Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any...
Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.
whisper-cpp-serve Real-time speech recognition and c+ of OpenAI's Whisper model in C/C++
A scalable python-based framework for gene regulatory network inference using tree-based ensemble regressors.
Minimal code and examnples for inferencing Sapiens foundation human models in Pytorch
A high-throughput and memory-efficient inference and serving engine for LLMs
Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any...
Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.
whisper-cpp-serve Real-time speech recognition and c+ of OpenAI's Whisper model in C/C++
A scalable python-based framework for gene regulatory network inference using tree-based ensemble regressors.
Minimal code and examnples for inferencing Sapiens foundation human models in Pytorch
Code for ACM MobiCom 2024 paper "FlexNN: Efficient and Adaptive DNN Inference on Memory-Constrained Edge Devices"
A high-throughput and memory-efficient inference and serving engine for LLMs
Code for ACM MobiCom 2024 paper "FlexNN: Efficient and Adaptive DNN Inference on Memory-Constrained Edge Devices"
The Qualcomm® AI Hub apps are a collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.) and ready to deploy on Qualcomm® devices.
Instantly calculate the maximum size of quantized language models that can fit in your available RAM, helping you optimize your models for inference.
PocketGroq is a powerful Python library that simplifies integration with the Groq API, offering advanced features for natural language processing, web scraping, and autonomous agent capabilities. Key ...
SGLang is a fast serving framework for large language models and vision language models.
High performance AI inference stack. Built for production. @ziglang / @openxla / MLIR / @bazelbuild
A high-throughput and memory-efficient inference and serving engine for LLMs
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
SGLang is a fast serving framework for large language models and vision language models.
PyTorch native quantization and sparsity for training and inference
SGLang is a fast serving framework for large language models and vision language models.
A high-performance inference system for large language models, designed for production environments.