2 results found Sort:

48
326
apache-2.0
6
Production ready LLM model compression/quantization toolkit with accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.
Created 2024-06-17
2,029 commits to main branch, last one a day ago
13
80
apache-2.0
5
☸️ Easy, advanced inference platform for large language models on Kubernetes. 🌟 Star to support our work!
Created 2023-11-20
357 commits to main branch, last one 6 days ago