3 results found Sort:

A pytorch quantization backend for optimum
Created 2023-09-19
716 commits to main branch, last one 3 days ago
51
337
apache-2.0
4
Production ready LLM model compression/quantization toolkit with accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.
Created 2024-06-17
2,040 commits to main branch, last one 20 hours ago
Accelerated NLP pipelines for fast inference on CPU and GPU. Built with Transformers, Optimum and ONNX Runtime.
Created 2022-03-16
154 commits to master branch, last one 2 years ago