TY - JOUR
T1 - Doubly Relaxed Knowledge Distillation for Deep Face Recognition
AU - Zhou, Si
AU - Yuan, Xiaochen
AU - Yang, Guanghua
AU - Zhang, Xinyuan
AU - Ying, Zuobin
AU - Gong, Xueyuan
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2026
Y1 - 2026
N2 - In face recognition tasks, knowledge distillation tends to employ feature-based methods due to their better performance, while neglecting further research on logits-based distillation methods. However, the former strictly requires consistent feature dimensions between the teacher and student, which calls for additional design work. In this paper, we find that most logits-based methods not only strictly constrain the logits variances of the teacher and student to be consistent, but also implicitly involve a restricted linear relationship. Therefore, we propose a new logits-based distillation method, named Doubly Relaxed Knowledge Distillation (DR-KD). Specifically, we use Z-score standardization instead of the softmax function to process logits, and achieve consistent logits variances through manual intervention. Subsequently, we introduce cosine similarity to measure the knowledge transfer between teacher and student networks. Through derivation, we find that the Pearson coefficient can be directly and equivalently used to calculate the similarity between their original logits. The linear relationship represented by the Pearson coefficient is more extensive, and the restricted linear relationship is a special case of it. Our method can break the constraint on variance consistency and relax the restriction on linear relationships. As a result, extensive experiments and ablation studies show that our DR-KD can enhance the discriminative learning capability of the student and demonstrate superiority over various state-of-the-art competitors on several face challenging benchmarks, such as IJB-B, IJB-C, and so on.
AB - In face recognition tasks, knowledge distillation tends to employ feature-based methods due to their better performance, while neglecting further research on logits-based distillation methods. However, the former strictly requires consistent feature dimensions between the teacher and student, which calls for additional design work. In this paper, we find that most logits-based methods not only strictly constrain the logits variances of the teacher and student to be consistent, but also implicitly involve a restricted linear relationship. Therefore, we propose a new logits-based distillation method, named Doubly Relaxed Knowledge Distillation (DR-KD). Specifically, we use Z-score standardization instead of the softmax function to process logits, and achieve consistent logits variances through manual intervention. Subsequently, we introduce cosine similarity to measure the knowledge transfer between teacher and student networks. Through derivation, we find that the Pearson coefficient can be directly and equivalently used to calculate the similarity between their original logits. The linear relationship represented by the Pearson coefficient is more extensive, and the restricted linear relationship is a special case of it. Our method can break the constraint on variance consistency and relax the restriction on linear relationships. As a result, extensive experiments and ablation studies show that our DR-KD can enhance the discriminative learning capability of the student and demonstrate superiority over various state-of-the-art competitors on several face challenging benchmarks, such as IJB-B, IJB-C, and so on.
KW - Computer vision
KW - Face recognition
KW - Knowledge distillation
UR - https://www.scopus.com/pages/publications/105032799193
U2 - 10.1109/TBIOM.2026.3671881
DO - 10.1109/TBIOM.2026.3671881
M3 - Article
AN - SCOPUS:105032799193
SN - 2637-6407
JO - IEEE Transactions on Biometrics, Behavior, and Identity Science
JF - IEEE Transactions on Biometrics, Behavior, and Identity Science
ER -