3 results found Sort:

255
1.6k
apache-2.0
34
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization i...
Created 2022-09-20
603 commits to main branch, last one 2 days ago
33
478
mit
11
Microsoft Automatic Mixed Precision Library
Created 2023-01-30
97 commits to main branch, last one 4 months ago
33
303
apache-2.0
8
An innovative library for efficient LLM inference via low-bit quantization
Created 2023-11-20
344 commits to main branch, last one 4 days ago