Search Results - RepositoryStats

13 results found Sort:

Filter by Primary Language:
Python (10)
Jupyter Notebook (3)
+

neural-compressor intel

264

2.4k

apache-2.0

32

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

awq fp4 gptq int4 int8 pruning mxformat sparsity sparsegpt auto-tuning smoothquant quantization low-precision large-language-models knowledge-distillation post-training-quantization quantization-aware-training

Created 2020-07-21

3,742 commits to master branch, last one 10 days ago

micronet 666DZY666

476

2.2k

mit

40

micronet, a model compression and deploy lib. compression: 1、quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantization and Training of Neural Networks for Efficient Integer-Ari...

Created 2019-12-04

295 commits to master branch, last one 3 years ago

TinyNeuralNetwork alibaba

125

822

mit

20

TinyNeuralNetwork is an efficient and easy-to-use deep learning model compression framework.

pruning pytorch deep-learning model-converter model-compression deep-neural-networks post-training-quantization quantization-aware-training

Created 2021-11-02

827 commits to main branch, last one about a month ago

SqueezeLLM SqueezeAILab

45

685

mit

18

[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization

llm llama localllm transformer quantization small-models text-generation model-compression efficient-inference large-language-models post-training-quantization natural-language-processing

Created 2023-06-12

50 commits to main branch, last one about a year ago

51

452

apache-2.0

10

[EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit".

Created 2024-03-06

480 commits to main branch, last one 3 days ago

q-diffusion Xiuyu-Li

24

347

mit

16

[ICCV 2023] Q-Diffusion: Quantizing Diffusion Models.

ddim pytorch quantization diffusion-models stable-diffusion model-compression post-training-quantization

Created 2023-03-24

12 commits to master branch, last one about a year ago

Sparsebit megvii-research

40

331

apache-2.0

11

A model compression and acceleration toolbox based on pytorch.

sparse pruning tensorrt quantization deep-learning post-training-quantization quantization-aware-training

Created 2022-07-21

134 commits to main branch, last one about a year ago

FQ-ViT megvii-research

50

331

apache-2.0

5

[IJCAI 2022] FQ-ViT: Post-Training Quantization for Fully Quantized Vision Transformer

pytorch imagenet quantization vision-transformer post-training-quantization

Created 2021-11-24

20 commits to main branch, last one 2 years ago

Adventures-in-TensorFlow-Lite sayakpaul

35

172

apache-2.0

10

This repository contains notebooks that show the usage of TensorFlow Lite for quantizing deep neural networks.

tf-hub pruning inference on-device-ml tensorflow-2 tf-lite-model tensorflow-lite model-optimization model-quantization post-training-quantization quantization-aware-training

Created 2020-04-29

143 commits to master branch, last one 2 years ago

DuQuant Hsu1023

10

155

mit

2

[NeurIPS 2024 Oral🔥] DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs.

llm quantization large-language-models post-training-quantization

Created 2024-05-25

5 commits to main branch, last one 6 months ago

quantization-notes hkproj

16

79

unknown

2

Notes on quantization in neural networks

pytorch quantization deep-learning neural-networks post-training-quantization quantization-aware-training

Created 2023-11-24

15 commits to main branch, last one about a year ago

TFMQ-DM ModelTC

4

62

apache-2.0

8

[CVPR 2024 Highlight] This is the official PyTorch implementation of "TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models".

ldm cvpr ddim cvpr2024 highlight quantization diffusion-models stable-diffusion post-training-quantization

Created 2024-03-09

32 commits to main branch, last one 8 months ago

4

35

apache-2.0

8

[ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models"

llm llama llama2 pytorch quantization transformers post-training-quantization

Created 2024-02-21

9 commits to main branch, last one about a year ago