Hybrid Dual-Input Model for Respiratory Sound Classification With Mel Spectrogram and Waveform

Fan Wang, Jiacheng Gao, Ying Wang, Guoheng Huang, Xiaochen Yuan

Research output: Contribution to journalArticlepeer-review

Abstract

Respiratory sounds serve as early indicators of lung diseases. The development of computer-aided classification systems has become a key enabler for timely diagnosis and treatment. The technology has improved basic services, particularly in resource-limited urban settings. We proposed an advanced hybrid dual-input model tailored for the intelligent classification of respiratory sounds. In this model, we employed Mel-spectrograms and waveform representations as feature extraction methods, utilizing the strengths of multiple modalities to enhance model performance. The classification framework integrates the Squeeze-and-Excitation (SE) attention mechanism into the ResNet architecture to construct the Bi-SEResNet model and adopts the Data-Efficient Image Transformer (DeiT) as the final classification layer. Model performance is evaluated using the SPR-Sound dataset, which includes two classification tasks: a 2-category classification of respiratory sound events into Normal and Abnormal, and a 7-category classification involving Normal, Rhonchi, Wheeze, Stridor, Coarse Crackle, Fine Crackle, and Wheeze & Crackle. Performance was assessed using sensitivity (SE), specificity (SP), average score (AS), and harmonic score (HS) as a composite score. The proposed framework achieved scores of 89.26 and 83.63 for 2-category classification and 7-category classification tasks, respectively.

Original languageEnglish
Pages (from-to)80971-80980
Number of pages10
JournalIEEE Access
Volume13
DOIs
Publication statusPublished - 2025

Keywords

  • Bi-SEResNet
  • DeiT transformer
  • Respiratory sound
  • signal processing

Fingerprint

Dive into the research topics of 'Hybrid Dual-Input Model for Respiratory Sound Classification With Mel Spectrogram and Waveform'. Together they form a unique fingerprint.

Cite this