Abstract
Natural language-processing tasks have been improved greatly by large language models (LLMs). However, numerous parameters make their execution computationally expensive and difficult on resource-constrained devices. For this problem, as well as maintaining accuracy, some techniques such as distillation and quantization have been proposed. Unfortunately, current methods fail to integrate model pruning with downstream tasks and overlook sentence-level semantic modeling, resulting in reduced efficiency of distillation. To alleviate these limitations, we propose a novel distilled lightweight model for BERT named MicroBERT. This method can transfer the knowledge contained in the “teacher” BERT model to a “student” BERT model. The sentence-level feature alignment loss (FAL) distillation mechanism, guided by Mixture-of-Experts (MoE), captures comprehensive contextual semantic knowledge from the “teacher” model to enhance the “student” model’s performance while reducing its parameters. To make the outputs of “teacher” and “student” models comparable, we introduce the idea of a generative adversarial network (GAN) to train a discriminator. Our experimental results based on four datasets show that all steps of our distillation mechanism are effective, and the MicroBERT (101.14%) model outperforms TinyBERT (99%) by 2.24% in terms of average distillation reductions in various tasks on the GLUE dataset.
| Original language | English |
|---|---|
| Article number | 6171 |
| Journal | Applied Sciences (Switzerland) |
| Volume | 14 |
| Issue number | 14 |
| DOIs | |
| Publication status | Published - Jul 2024 |
Keywords
- Mixture-of-Experts
- generative adversarial networks
- knowledge distillation
- natural language processing
Fingerprint
Dive into the research topics of 'MicroBERT: Distilling MoE-Based Knowledge from BERT into a Lighter Model'. Together they form a unique fingerprint.Press/Media
-
Faculty of Applied Sciences Researchers Provide New Study Findings on Applied Sciences (MicroBERT: Distilling MoE-Based Knowledge from BERT into a Lighter Model)
9/08/24
1 item of Media coverage
Press/Media
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver