TY - GEN
T1 - Contextual Multi-Scale Feature Learning for Person Re-Identification
AU - Fan, Baoyu
AU - Wang, Li
AU - Zhang, Runze
AU - Guo, Zhenhua
AU - Zhao, Yaqian
AU - Li, Rengang
AU - Gong, Weifeng
N1 - Publisher Copyright:
© 2020 ACM.
PY - 2020/10/12
Y1 - 2020/10/12
N2 - Representing features at multiple scales is significant for person re-identification (Re-ID). Most existing methods learn the multi-scale features by stacking streams and convolutions without considering the cooperation of multiple scales at a granular level. However, most scales are more discriminative only when they integrate other scales as contextual information. We termed that contextual multi-scale. In this paper, we proposed a novel architecture, namely contextual multi-scale network (CMSNet), for learning common and contextual multi-scale representations simultaneously. The building block of CMSNet obtains contextual multi-scale representations by bidirectionally hierarchical connection groups: the forward hierarchical connection group for stepwise inter-scale information fusion and the backward hierarchical connection group for leap-frogging inter-scale information fusion. Too rich scale features without a selection will confuse the discrimination. Additionally, we introduced a new channel-wise scale selection module to dynamically select scale features for corresponding input image. To the best of our knowledge, CMSNet is the most lightweight model for person Re-ID and it achieves state-of-the-art performance on four commonly used Re-ID datasets, surpassing most large-scale models.
AB - Representing features at multiple scales is significant for person re-identification (Re-ID). Most existing methods learn the multi-scale features by stacking streams and convolutions without considering the cooperation of multiple scales at a granular level. However, most scales are more discriminative only when they integrate other scales as contextual information. We termed that contextual multi-scale. In this paper, we proposed a novel architecture, namely contextual multi-scale network (CMSNet), for learning common and contextual multi-scale representations simultaneously. The building block of CMSNet obtains contextual multi-scale representations by bidirectionally hierarchical connection groups: the forward hierarchical connection group for stepwise inter-scale information fusion and the backward hierarchical connection group for leap-frogging inter-scale information fusion. Too rich scale features without a selection will confuse the discrimination. Additionally, we introduced a new channel-wise scale selection module to dynamically select scale features for corresponding input image. To the best of our knowledge, CMSNet is the most lightweight model for person Re-ID and it achieves state-of-the-art performance on four commonly used Re-ID datasets, surpassing most large-scale models.
KW - attention mechanism
KW - contextual multi-scale
KW - hierarchical connection
KW - person re-identification
UR - http://www.scopus.com/inward/record.url?scp=85106950593&partnerID=8YFLogxK
U2 - 10.1145/3394171.3414038
DO - 10.1145/3394171.3414038
M3 - Conference contribution
AN - SCOPUS:85106950593
T3 - MM 2020 - Proceedings of the 28th ACM International Conference on Multimedia
SP - 655
EP - 663
BT - MM 2020 - Proceedings of the 28th ACM International Conference on Multimedia
PB - Association for Computing Machinery, Inc
T2 - 28th ACM International Conference on Multimedia, MM 2020
Y2 - 12 October 2020 through 16 October 2020
ER -