Acceleration and Model Compression

Published: 09 Oct 2015 Category: deep_learning

Papers

Distilling the Knowledge in a Neural Network

intro: NIPS 2014 Deep Learning Workshop
author: Geoffrey Hinton, Oriol Vinyals, Jeff Dean
arxiv: http://arxiv.org/abs/1503.02531
blog: http://fastml.com/geoff-hintons-dark-knowledge/
notes: https://github.com/dennybritz/deeplearning-papernotes/blob/master/notes/distilling-the-knowledge-in-a-nn.md

Deep Model Compression: Distilling Knowledge from Noisy Teachers

arxiv: https://arxiv.org/abs/1610.09650
github: https://github.com/chengshengchan/model_compression]

A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning

intro: CVPR 2017
paper: http://openaccess.thecvf.com/content_cvpr_2017/papers/Yim_A_Gift_From_CVPR_2017_paper.pdf

Like What You Like: Knowledge Distill via Neuron Selectivity Transfer

intro: TuSimple
arxiv: https://arxiv.org/abs/1707.01219
github: https://github.com/TuSimple/neuron-selectivity-transfer

Learning Loss for Knowledge Distillation with Conditional Adversarial Networks

https://arxiv.org/abs/1709.00513

Data-Free Knowledge Distillation for Deep Neural Networks

https://arxiv.org/abs/1710.07535

Knowledge Projection for Deep Neural Networks

https://arxiv.org/abs/1710.09505

Moonshine: Distilling with Cheap Convolutions

https://arxiv.org/abs/1711.02613

model_compression: Implementation of model compression with knowledge distilling method

github: https://github.com/chengshengchan/model_compression

Neural Network Distiller

intro: Neural Network Distiller: a Python package for neural network compression research
project page: https://nervanasystems.github.io/distiller/
github: https://github.com/NervanaSystems/distiller

Knowledge Distillation in Generations: More Tolerant Teachers Educate Better Students

intro: The Johns Hopkins University
arxiv: https://arxiv.org/abs/1805.05551

Improving Knowledge Distillation with Supporting Adversarial Samples

https://arxiv.org/abs/1805.05532

Recurrent knowledge distillation

intro: ICIP 2018
arxiv: https://arxiv.org/abs/1805.07170

Knowledge Distillation by On-the-Fly Native Ensemble

https://arxiv.org/abs/1806.04606

Improved Knowledge Distillation via Teacher Assistant: Bridging the Gap Between Student and Teacher

intro: Washington State University & DeepMind
arxiv: https://arxiv.org/abs/1902.03393

Correlation Congruence for Knowledge Distillation

intro: NUDT & SenseTime & BUAA & CUHK
keywords: Correlation Congruence Knowledge Distillation (CCKD)
arxiv: https://arxiv.org/abs/1904.01802

Similarity-Preserving Knowledge Distillation

intro: ICCV 2019
arxiv: https://arxiv.org/abs/1907.09682

Highlight Every Step: Knowledge Distillation via Collaborative Teaching

https://arxiv.org/pdf/1907.09643.pdf

Ensemble Knowledge Distillation for Learning Improved and Efficient Networks

https://arxiv.org/abs/1909.08097

Revisit Knowledge Distillation: a Teacher-free Framework

On the Efficacy of Knowledge Distillation

intro: Cornell University
arxiv: https://arxiv.org/abs/1910.01348

Training convolutional neural networks with cheap convolutions and online distillation

arxiv: https://arxiv.org/abs/1909.13063
github: https://github.com/EthanZhangYC/OD-cheap-convolution

Knowledge Representing: Efficient, Sparse Representation of Prior Knowledge for Knowledge Distillation

https://arxiv.org/abs/1911.05329

Preparing Lessons: Improve Knowledge Distillation with Better Supervision

intro: Xi’an Jiaotong University & Meituan
keywords: Knowledge Adjustment (KA), Dynamic Temperature Distillation (DTD)
arxiv: https://arxiv.org/abs/1911.07471

QKD: Quantization-aware Knowledge Distillation

https://arxiv.org/abs/1911.12491

Explaining Knowledge Distillation by Quantifying the Knowledge

https://arxiv.org/abs/2003.03622

Knowledge distillation via adaptive instance normalization

https://arxiv.org/abs/2003.04289

Distillating Knowledge from Graph Convolutional Networks

intro: CVPR 2020
arxiv: https://arxiv.org/abs/2003.10477

Regularizing Class-wise Predictions via Self-knowledge Distillation

intro: CVPR 2020
arxiv: https://arxiv.org/abs/2003.13964

Online Knowledge Distillation with Diverse Peers

intro: AAAI 2020
arxiv: https://arxiv.org/abs/1912.00350
github: https://github.com/DefangChen/OKDDip

Channel Distillation: Channel-Wise Attention for Knowledge Distillation

arxiv: https://arxiv.org/abs/2006.01683
github: https://github.com/zhouzaida/channel-distillation

Peer Collaborative Learning for Online Knowledge Distillation

intro: Queen Mary University of London
arxiv: https://arxiv.org/abs/2006.04147

Knowledge Distillation for Multi-task Learning

intro: University of Edinburgh
arxiv: https://arxiv.org/abs/2007.06889

Differentiable Feature Aggregation Search for Knowledge Distillation

https://arxiv.org/abs/2008.00506

Prime-Aware Adaptive Distillation

intro: ECCV 2020
arxiv: https://arxiv.org/abs/2008.01458

Knowledge Transfer via Dense Cross-Layer Mutual-Distillation

intro: ECCV 2020
arxiv: https://arxiv.org/abs/2008.07816
github: https://github.com/sundw2014/DCM

Matching Guided Distillation

intro: ECCV 2020
intro: Aibee Inc.
project page: http://kaiyuyue.com/mgd/
arxiv: https://arxiv.org/abs/2008.09958

Domain Adaptation Through Task Distillation

intro: ECCV 2020
arxiv: https://arxiv.org/abs/2008.11911
github: https://github.com/bradyz/task-distillation

Spherical Knowledge Distillation

https://arxiv.org/abs/2010.07485

In Defense of Feature Mimicking for Knowledge Distillation

intro: Nanjing University
arxiv: https://arxiv.org/abs/2011.01424

Online Ensemble Model Compression using Knowledge Distillation

intro: CMU
arxiv: https://arxiv.org/abs/2011.07449

Refine Myself by Teaching Myself: Feature Refinement via Self-Knowledge Distillation

intro: CVPR 2021
arxiv: https://arxiv.org/abs/2103.08273
github(Pytorch): https://github.com/MingiJi/FRSKD

Resources

Awesome Knowledge-Distillation

https://github.com/FLHonker/Awesome-Knowledge-Distillation

Acceleration and Model Compression

Papers

Resources

About me

Recent Posts

Links