3 results found Sort:
A pytorch quantization backend for optimum
Created
2023-09-19
716 commits to main branch, last one 3 days ago
Production ready LLM model compression/quantization toolkit with accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.
Created
2024-06-17
2,040 commits to main branch, last one 20 hours ago
Accelerated NLP pipelines for fast inference on CPU and GPU. Built with Transformers, Optimum and ONNX Runtime.
Created
2022-03-16
154 commits to master branch, last one 2 years ago