TY - GEN
T1 - Using Multiple Heads to Subsize Meta-memorization Problem
AU - Wang, Lu
AU - Eddie Law, K. L.
N1 - Publisher Copyright:
© 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2022
Y1 - 2022
N2 - The memorization problem is a meta-level overfitting phenomenon in meta-learning. The trained model prefers to remember learned tasks instead of adapting to new tasks. This issue limits many meta-learning approaches to generalize. In this paper, we mitigate this limitation issue by proposing multiple supervisions through a multi-objective optimization process. The design leads to a Multi-Input Multi-Output (MIMO) configuration for meta-learning. The model has multiple outputs through different heads. Each head is supervised by a different order of labels for the same task. This leads to different memories, resulting in meta-level conflicts as regularization to avoid meta-overfitting. The resulting MIMO configuration is applicable to all MAML-like algorithms with minor increments in training computation, the inference calculation can be reduced through early-exit policy or better performance can be achieved through low cost ensemble. In experiments, identical model and training settings are used in all test cases, our proposed design is able to suppress the meta-overfitting issue, achieve smoother loss landscapes, and improve generalisation.
AB - The memorization problem is a meta-level overfitting phenomenon in meta-learning. The trained model prefers to remember learned tasks instead of adapting to new tasks. This issue limits many meta-learning approaches to generalize. In this paper, we mitigate this limitation issue by proposing multiple supervisions through a multi-objective optimization process. The design leads to a Multi-Input Multi-Output (MIMO) configuration for meta-learning. The model has multiple outputs through different heads. Each head is supervised by a different order of labels for the same task. This leads to different memories, resulting in meta-level conflicts as regularization to avoid meta-overfitting. The resulting MIMO configuration is applicable to all MAML-like algorithms with minor increments in training computation, the inference calculation can be reduced through early-exit policy or better performance can be achieved through low cost ensemble. In experiments, identical model and training settings are used in all test cases, our proposed design is able to suppress the meta-overfitting issue, achieve smoother loss landscapes, and improve generalisation.
KW - Meta-learning
KW - Meta-overfitting
KW - Multi-head
UR - http://www.scopus.com/inward/record.url?scp=85138706699&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-15937-4_42
DO - 10.1007/978-3-031-15937-4_42
M3 - Conference contribution
AN - SCOPUS:85138706699
SN - 9783031159367
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 496
EP - 507
BT - Artificial Neural Networks and Machine Learning - ICANN 2022 - 31st International Conference on Artificial Neural Networks, Proceedings
A2 - Pimenidis, Elias
A2 - Aydin, Mehmet
A2 - Angelov, Plamen
A2 - Jayne, Chrisina
A2 - Papaleonidas, Antonios
PB - Springer Science and Business Media Deutschland GmbH
T2 - 31st International Conference on Artificial Neural Networks, ICANN 2022
Y2 - 6 September 2022 through 9 September 2022
ER -