Search Results - RepositoryStats

352

2.1k

apache-2.0

36

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization i...

fp8 gpu jax cuda python pytorch deep-learning machine-learning

Created 2022-09-20

860 commits to main branch, last one a day ago

MS-AMP Azure

46

555

mit

11

Microsoft Automatic Mixed Precision Library

amp fp8 gpu pytorch transformer deep-learning mixed-precision

Created 2023-01-30

98 commits to main branch, last one 5 months ago

neural-speed intel

38

352

apache-2.0

8

An innovative library for efficient LLM inference via low-bit quantization

This repository has been archived (exclude archived)

Created 2023-11-20

345 commits to main branch, last one 5 months ago

flux-fp8-api aredden

31

237

apache-2.0

5

Flux diffusion model implementation using quantized fp8 matmul & remaining layers use faster half precision accumulate, which is ~2x faster on consumer devices.

fp8 flux pytorch diffusion quantization fast-inference

Created 2024-08-05

64 commits to main branch, last one 3 months ago