Statistics for topic model-compression
RepositoryStats tracks 584,796 Github repositories, of these 106 are tagged with the model-compression topic. The most common primary language for repositories using this topic is Python (78).
Stargazers over time for topic model-compression
Most starred repositories for topic model-compression (view more)
Trending repositories for topic model-compression (view more)
Efficient AI Backbones including GhostNet, TNT and MLP, developed by Huawei Noah's Ark Lab.
The official implementation of the paper "What Matters in Transformers? Not All Attention is Needed".
A list of papers, docs, codes about model quantization. This repo is aimed to provide the info for model quantization research, we are continuously improving the project. Welcome to PR the works (pape...
The official implementation of the paper "What Matters in Transformers? Not All Attention is Needed".
[CVPR 2024] DeepCache: Accelerating Diffusion Models for Free
List of papers related to neural network quantization in recent AI conferences and journals.
A list of papers, docs, codes about model quantization. This repo is aimed to provide the info for model quantization research, we are continuously improving the project. Welcome to PR the works (pape...
The official implementation of the paper "What Matters in Transformers? Not All Attention is Needed".
Efficient AI Backbones including GhostNet, TNT and MLP, developed by Huawei Noah's Ark Lab.
A list of papers, docs, codes about model quantization. This repo is aimed to provide the info for model quantization research, we are continuously improving the project. Welcome to PR the works (pape...
The official implementation of the paper "What Matters in Transformers? Not All Attention is Needed".
a collection of computer vision projects&tools. 计算机视觉方向项目和工具集合。
The official implementation of the paper <MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression>
List of papers related to neural network quantization in recent AI conferences and journals.
The official implementation of the paper "What Matters in Transformers? Not All Attention is Needed".
A list of papers, docs, codes about model quantization. This repo is aimed to provide the info for model quantization research, we are continuously improving the project. Welcome to PR the works (pape...
An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
The official implementation of the paper "What Matters in Transformers? Not All Attention is Needed".
Official Repo for SparseLLM: Global Pruning of LLMs (NeurIPS 2024)
The official implementation of the paper <MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression>
The official implementation of the paper "Demystifying the Compression of Mixture-of-Experts Through a Unified Framework".
The Truth Is In There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction
[CVPR 2024 Highlight] Logit Standardization in Knowledge Distillation
[NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
[CVPR 2024] DeepCache: Accelerating Diffusion Models for Free
An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
A list of papers, docs, codes about model quantization. This repo is aimed to provide the info for model quantization research, we are continuously improving the project. Welcome to PR the works (pape...
Efficient AI Backbones including GhostNet, TNT and MLP, developed by Huawei Noah's Ark Lab.
[CVPR 2024] DeepCache: Accelerating Diffusion Models for Free
The Truth Is In There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction
Official Code for "SVD-LLM: Truncation-aware Singular Value Decomposition for Large Language Model Compression"