Trending repositories for topic model-compression
Efficient AI Backbones including GhostNet, TNT and MLP, developed by Huawei Noah's Ark Lab.
The official implementation of the paper "What Matters in Transformers? Not All Attention is Needed".
Awesome Knowledge-Distillation. 分类整理的知识蒸馏paper(2014-2021)。
List of papers related to neural network quantization in recent AI conferences and journals.
Pytorch implementation of various Knowledge Distillation (KD) methods.
Efficient computing methods developed by Huawei Noah's Ark Lab
A list of papers, docs, codes about model quantization. This repo is aimed to provide the info for model quantization research, we are continuously improving the project. Welcome to PR the works (pape...
An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
A PyTorch implementation for exploring deep and shallow knowledge distillation (KD) experiments with flexibility
The official implementation of the paper "What Matters in Transformers? Not All Attention is Needed".
[CVPR 2024] DeepCache: Accelerating Diffusion Models for Free
List of papers related to neural network quantization in recent AI conferences and journals.
Efficient AI Backbones including GhostNet, TNT and MLP, developed by Huawei Noah's Ark Lab.
Efficient computing methods developed by Huawei Noah's Ark Lab
Pytorch implementation of various Knowledge Distillation (KD) methods.
Awesome Knowledge-Distillation. 分类整理的知识蒸馏paper(2014-2021)。
A PyTorch implementation for exploring deep and shallow knowledge distillation (KD) experiments with flexibility
A list of papers, docs, codes about model quantization. This repo is aimed to provide the info for model quantization research, we are continuously improving the project. Welcome to PR the works (pape...
An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
The official implementation of the paper "What Matters in Transformers? Not All Attention is Needed".
Efficient AI Backbones including GhostNet, TNT and MLP, developed by Huawei Noah's Ark Lab.
A list of papers, docs, codes about model quantization. This repo is aimed to provide the info for model quantization research, we are continuously improving the project. Welcome to PR the works (pape...
Awesome Knowledge-Distillation. 分类整理的知识蒸馏paper(2014-2021)。
An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
List of papers related to neural network quantization in recent AI conferences and journals.
Pytorch implementation of various Knowledge Distillation (KD) methods.
A PyTorch implementation for exploring deep and shallow knowledge distillation (KD) experiments with flexibility
a collection of computer vision projects&tools. 计算机视觉方向项目和工具集合。
[CVPR 2024 Highlight] Logit Standardization in Knowledge Distillation
The official implementation of the paper <MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression>
Awesome machine learning model compression research papers, quantization, tools, and learning material.
The official implementation of the paper "What Matters in Transformers? Not All Attention is Needed".
List of papers related to neural network quantization in recent AI conferences and journals.
a collection of computer vision projects&tools. 计算机视觉方向项目和工具集合。
The official implementation of the paper <MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression>
[CVPR 2024] DeepCache: Accelerating Diffusion Models for Free
[CVPR 2024 Highlight] Logit Standardization in Knowledge Distillation
A list of papers, docs, codes about model quantization. This repo is aimed to provide the info for model quantization research, we are continuously improving the project. Welcome to PR the works (pape...
Awesome Knowledge-Distillation. 分类整理的知识蒸馏paper(2014-2021)。
Pytorch implementation of various Knowledge Distillation (KD) methods.
Efficient AI Backbones including GhostNet, TNT and MLP, developed by Huawei Noah's Ark Lab.
Awesome machine learning model compression research papers, quantization, tools, and learning material.
A PyTorch implementation for exploring deep and shallow knowledge distillation (KD) experiments with flexibility
TinyNeuralNetwork is an efficient and easy-to-use deep learning model compression framework.
A list of high-quality (newest) AutoML works and lightweight models including 1.) Neural Architecture Search, 2.) Lightweight Structures, 3.) Model Compression, Quantization and Acceleration, 4.) Hype...
The official implementation of the paper "What Matters in Transformers? Not All Attention is Needed".
A list of papers, docs, codes about model quantization. This repo is aimed to provide the info for model quantization research, we are continuously improving the project. Welcome to PR the works (pape...
Efficient AI Backbones including GhostNet, TNT and MLP, developed by Huawei Noah's Ark Lab.
An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
[CVPR 2024] DeepCache: Accelerating Diffusion Models for Free
Awesome Knowledge-Distillation. 分类整理的知识蒸馏paper(2014-2021)。
a collection of computer vision projects&tools. 计算机视觉方向项目和工具集合。
[CVPR 2024 Highlight] Logit Standardization in Knowledge Distillation
List of papers related to neural network quantization in recent AI conferences and journals.
A PyTorch implementation for exploring deep and shallow knowledge distillation (KD) experiments with flexibility
Efficient computing methods developed by Huawei Noah's Ark Lab
The official implementation of the paper <MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression>
Pytorch implementation of various Knowledge Distillation (KD) methods.
[NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
The official implementation of the paper "What Matters in Transformers? Not All Attention is Needed".
Official Repo for SparseLLM: Global Pruning of LLMs (NeurIPS 2024)
The official implementation of the paper <MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression>
The official implementation of the paper "Demystifying the Compression of Mixture-of-Experts Through a Unified Framework".
a collection of computer vision projects&tools. 计算机视觉方向项目和工具集合。
Less is More: Task-aware Layer-wise Distillation for Language Model Compression (ICML2023)
[CVPR 2024 Highlight] Logit Standardization in Knowledge Distillation
[NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
A list of papers, docs, codes about efficient AIGC. This repo is aimed to provide the info for efficient AIGC research, including language and vision, we are continuously improving the project. Welcom...
Vocabulary Trimming (VT) is a model compression technique, which reduces a multilingual LM vocabulary to a target language by deleting irrelevant tokens from its vocabulary. This repository contains a...
Official Code for "SVD-LLM: Truncation-aware Singular Value Decomposition for Large Language Model Compression"
List of papers related to neural network quantization in recent AI conferences and journals.
[ICML 2023] UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers.
[CVPR 2024] DeepCache: Accelerating Diffusion Models for Free
A list of papers, docs, codes about model quantization. This repo is aimed to provide the info for model quantization research, we are continuously improving the project. Welcome to PR the works (pape...
The Truth Is In There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction
[CVPR 2024 Highlight] Logit Standardization in Knowledge Distillation
[NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
The official implementation of the paper "What Matters in Transformers? Not All Attention is Needed".
The official implementation of the paper <MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression>
Official Code for "SVD-LLM: Truncation-aware Singular Value Decomposition for Large Language Model Compression"
The official implementation of the paper "Demystifying the Compression of Mixture-of-Experts Through a Unified Framework".
Official Repo for SparseLLM: Global Pruning of LLMs (NeurIPS 2024)
[CVPR 2024] DeepCache: Accelerating Diffusion Models for Free
An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
A list of papers, docs, codes about model quantization. This repo is aimed to provide the info for model quantization research, we are continuously improving the project. Welcome to PR the works (pape...
Efficient AI Backbones including GhostNet, TNT and MLP, developed by Huawei Noah's Ark Lab.
The Truth Is In There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction
[CVPR 2024 Highlight] Logit Standardization in Knowledge Distillation
List of papers related to neural network quantization in recent AI conferences and journals.
[NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
Efficient computing methods developed by Huawei Noah's Ark Lab
Pytorch implementation of various Knowledge Distillation (KD) methods.
Awesome Knowledge-Distillation. 分类整理的知识蒸馏paper(2014-2021)。
A PyTorch implementation for exploring deep and shallow knowledge distillation (KD) experiments with flexibility
Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.
[CVPR 2024] DeepCache: Accelerating Diffusion Models for Free
The Truth Is In There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction
Official Code for "SVD-LLM: Truncation-aware Singular Value Decomposition for Large Language Model Compression"
List of papers related to neural network quantization in recent AI conferences and journals.
a collection of computer vision projects&tools. 计算机视觉方向项目和工具集合。
PyTorch Lightning implementation of the paper Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding. This repository allows to reproduce the main fin...
A list of papers, docs, codes about efficient AIGC. This repo is aimed to provide the info for efficient AIGC research, including language and vision, we are continuously improving the project. Welcom...
Vocabulary Trimming (VT) is a model compression technique, which reduces a multilingual LM vocabulary to a target language by deleting irrelevant tokens from its vocabulary. This repository contains a...
Less is More: Task-aware Layer-wise Distillation for Language Model Compression (ICML2023)
This is a collection of our research on efficient AI, covering hardware-aware NAS and model compression.
Resources of our survey paper "A Comprehensive Survey on AI Integration at the Edge: Techniques, Applications, and Challenges"
[ICML 2023] UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers.
[ICML 2023] This project is the official implementation of our accepted ICML 2023 paper BiBench: Benchmarking and Analyzing Network Binarization.
Gather research papers, corresponding codes (if having), reading notes and any other related materials about Hot🔥🔥🔥 fields in Computer Vision based on Deep Learning.