TY - GEN
T1 - A Knowledge- Distillation - Integrated Pruning Method for Vision Transformer
AU - Xu, Bangguo
AU - Zhang, Tiankui
AU - Wang, Yapeng
AU - Chen, Zeren
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Vision transformers (ViTs) have made remarkable achievements in various computer vision applications such as image classification, object detection, and image segmentation. Since the self-attention mechanism introduced by itself can model the relationship between all pixels of the input image, the performance of the ViTs model is significantly improved compared to the traditional CNN network. However, their storage, runtime memory and computing requirements hinder their deployment on edge devices. This paper proposes a ViT pruning method with knowledge distillation, which can prune the ViT model and avoid the performance loss of the model after pruning. Based on the idea that knowledge distillation can make the student model improve the performance of the model by learning the unique knowledge of the teacher model, the convolution neural network (CNN) which has the unique ability of parameter sharing and local receptive field is used as a teacher model to guide the training of the ViT model and enable the ViT model to obtain the same ability. In addition, some important parts may be cut during pruning, resulting in irreversible loss of model performance. To solve this problem, this paper designs the importance score learning module to guide the pruning work, and determines that the pruning work removes the unimportant parts of the model. Finally, this paper compares the pruned model with other methods in terms of accuracy, Floating Point Operations(FLOPs) and model parameters on ImageNet-1k.
AB - Vision transformers (ViTs) have made remarkable achievements in various computer vision applications such as image classification, object detection, and image segmentation. Since the self-attention mechanism introduced by itself can model the relationship between all pixels of the input image, the performance of the ViTs model is significantly improved compared to the traditional CNN network. However, their storage, runtime memory and computing requirements hinder their deployment on edge devices. This paper proposes a ViT pruning method with knowledge distillation, which can prune the ViT model and avoid the performance loss of the model after pruning. Based on the idea that knowledge distillation can make the student model improve the performance of the model by learning the unique knowledge of the teacher model, the convolution neural network (CNN) which has the unique ability of parameter sharing and local receptive field is used as a teacher model to guide the training of the ViT model and enable the ViT model to obtain the same ability. In addition, some important parts may be cut during pruning, resulting in irreversible loss of model performance. To solve this problem, this paper designs the importance score learning module to guide the pruning work, and determines that the pruning work removes the unimportant parts of the model. Finally, this paper compares the pruned model with other methods in terms of accuracy, Floating Point Operations(FLOPs) and model parameters on ImageNet-1k.
KW - knowledge distillation
KW - network pruning
KW - transformer pruning
KW - vision transformer
UR - http://www.scopus.com/inward/record.url?scp=85142354890&partnerID=8YFLogxK
U2 - 10.1109/ISCIT55906.2022.9931309
DO - 10.1109/ISCIT55906.2022.9931309
M3 - Conference contribution
AN - SCOPUS:85142354890
T3 - 2022 21st International Symposium on Communications and Information Technologies, ISCIT 2022
SP - 210
EP - 215
BT - 2022 21st International Symposium on Communications and Information Technologies, ISCIT 2022
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 21st International Symposium on Communications and Information Technologies, ISCIT 2022
Y2 - 27 September 2022 through 30 September 2022
ER -