Acceleration and Model Compression
Papers
Distilling the Knowledge in a Neural Network
- intro: NIPS 2014 Deep Learning Workshop
 - author: Geoffrey Hinton, Oriol Vinyals, Jeff Dean
 - arxiv: http://arxiv.org/abs/1503.02531
 - blog: http://fastml.com/geoff-hintons-dark-knowledge/
 - notes: https://github.com/dennybritz/deeplearning-papernotes/blob/master/notes/distilling-the-knowledge-in-a-nn.md
 
Deep Model Compression: Distilling Knowledge from Noisy Teachers
- arxiv: https://arxiv.org/abs/1610.09650
 - github: https://github.com/chengshengchan/model_compression]
 
A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning
- intro: CVPR 2017
 - paper: http://openaccess.thecvf.com/content_cvpr_2017/papers/Yim_A_Gift_From_CVPR_2017_paper.pdf
 
Like What You Like: Knowledge Distill via Neuron Selectivity Transfer
- intro: TuSimple
 - arxiv: https://arxiv.org/abs/1707.01219
 - github: https://github.com/TuSimple/neuron-selectivity-transfer
 
Learning Loss for Knowledge Distillation with Conditional Adversarial Networks
https://arxiv.org/abs/1709.00513
Data-Free Knowledge Distillation for Deep Neural Networks
https://arxiv.org/abs/1710.07535
Knowledge Projection for Deep Neural Networks
https://arxiv.org/abs/1710.09505
Moonshine: Distilling with Cheap Convolutions
https://arxiv.org/abs/1711.02613
model_compression: Implementation of model compression with knowledge distilling method
Neural Network Distiller
- intro: Neural Network Distiller: a Python package for neural network compression research
 - project page: https://nervanasystems.github.io/distiller/
 - github: https://github.com/NervanaSystems/distiller
 
Knowledge Distillation in Generations: More Tolerant Teachers Educate Better Students
- intro: The Johns Hopkins University
 - arxiv: https://arxiv.org/abs/1805.05551
 
Improving Knowledge Distillation with Supporting Adversarial Samples
https://arxiv.org/abs/1805.05532
Recurrent knowledge distillation
- intro: ICIP 2018
 - arxiv: https://arxiv.org/abs/1805.07170
 
Knowledge Distillation by On-the-Fly Native Ensemble
https://arxiv.org/abs/1806.04606
Improved Knowledge Distillation via Teacher Assistant: Bridging the Gap Between Student and Teacher
- intro: Washington State University & DeepMind
 - arxiv: https://arxiv.org/abs/1902.03393
 
Correlation Congruence for Knowledge Distillation
- intro: NUDT & SenseTime & BUAA & CUHK
 - keywords: Correlation Congruence Knowledge Distillation (CCKD)
 - arxiv: https://arxiv.org/abs/1904.01802
 
Similarity-Preserving Knowledge Distillation
- intro: ICCV 2019
 - arxiv: https://arxiv.org/abs/1907.09682
 
Highlight Every Step: Knowledge Distillation via Collaborative Teaching
https://arxiv.org/pdf/1907.09643.pdf
Ensemble Knowledge Distillation for Learning Improved and Efficient Networks
https://arxiv.org/abs/1909.08097
Revisit Knowledge Distillation: a Teacher-free Framework
- arxiv: https://arxiv.org/abs/1909.11723
 - github: https://github.com/yuanli2333/Teacher-free-Knowledge-Distillation
 
On the Efficacy of Knowledge Distillation
- intro: Cornell University
 - arxiv: https://arxiv.org/abs/1910.01348
 
Training convolutional neural networks with cheap convolutions and online distillation
- arxiv: https://arxiv.org/abs/1909.13063
 - github: https://github.com/EthanZhangYC/OD-cheap-convolution
 
Knowledge Representing: Efficient, Sparse Representation of Prior Knowledge for Knowledge Distillation
https://arxiv.org/abs/1911.05329
Preparing Lessons: Improve Knowledge Distillation with Better Supervision
- intro: Xi’an Jiaotong University & Meituan
 - keywords: Knowledge Adjustment (KA), Dynamic Temperature Distillation (DTD)
 - arxiv: https://arxiv.org/abs/1911.07471
 
QKD: Quantization-aware Knowledge Distillation
https://arxiv.org/abs/1911.12491
Explaining Knowledge Distillation by Quantifying the Knowledge
https://arxiv.org/abs/2003.03622
Knowledge distillation via adaptive instance normalization
https://arxiv.org/abs/2003.04289
Distillating Knowledge from Graph Convolutional Networks
- intro: CVPR 2020
 - arxiv: https://arxiv.org/abs/2003.10477
 
Regularizing Class-wise Predictions via Self-knowledge Distillation
- intro: CVPR 2020
 - arxiv: https://arxiv.org/abs/2003.13964
 
Online Knowledge Distillation with Diverse Peers
- intro: AAAI 2020
 - arxiv: https://arxiv.org/abs/1912.00350
 - github: https://github.com/DefangChen/OKDDip
 
Channel Distillation: Channel-Wise Attention for Knowledge Distillation
Peer Collaborative Learning for Online Knowledge Distillation
- intro: Queen Mary University of London
 - arxiv: https://arxiv.org/abs/2006.04147
 
Knowledge Distillation for Multi-task Learning
- intro: University of Edinburgh
 - arxiv: https://arxiv.org/abs/2007.06889
 
Differentiable Feature Aggregation Search for Knowledge Distillation
https://arxiv.org/abs/2008.00506
Prime-Aware Adaptive Distillation
- intro: ECCV 2020
 - arxiv: https://arxiv.org/abs/2008.01458
 
Knowledge Transfer via Dense Cross-Layer Mutual-Distillation
- intro: ECCV 2020
 - arxiv: https://arxiv.org/abs/2008.07816
 - github: https://github.com/sundw2014/DCM
 
Matching Guided Distillation
- intro: ECCV 2020
 - intro: Aibee Inc.
 - project page: http://kaiyuyue.com/mgd/
 - arxiv: https://arxiv.org/abs/2008.09958
 
Domain Adaptation Through Task Distillation
- intro: ECCV 2020
 - arxiv: https://arxiv.org/abs/2008.11911
 - github: https://github.com/bradyz/task-distillation
 
Spherical Knowledge Distillation
https://arxiv.org/abs/2010.07485
In Defense of Feature Mimicking for Knowledge Distillation
- intro: Nanjing University
 - arxiv: https://arxiv.org/abs/2011.01424
 
Online Ensemble Model Compression using Knowledge Distillation
- intro: CMU
 - arxiv: https://arxiv.org/abs/2011.07449
 
Refine Myself by Teaching Myself: Feature Refinement via Self-Knowledge Distillation
- intro: CVPR 2021
 - arxiv: https://arxiv.org/abs/2103.08273
 - github(Pytorch): https://github.com/MingiJi/FRSKD
 
Resources
Awesome Knowledge-Distillation
https://github.com/FLHonker/Awesome-Knowledge-Distillation