TY - JOUR
T1 - Hybrid Dual-Input Model for Respiratory Sound Classification With Mel Spectrogram and Waveform
AU - Wang, Fan
AU - Gao, Jiacheng
AU - Wang, Ying
AU - Huang, Guoheng
AU - Yuan, Xiaochen
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2025
Y1 - 2025
N2 - Respiratory sounds serve as early indicators of lung diseases. The development of computer-aided classification systems has become a key enabler for timely diagnosis and treatment. The technology has improved basic services, particularly in resource-limited urban settings. We proposed an advanced hybrid dual-input model tailored for the intelligent classification of respiratory sounds. In this model, we employed Mel-spectrograms and waveform representations as feature extraction methods, utilizing the strengths of multiple modalities to enhance model performance. The classification framework integrates the Squeeze-and-Excitation (SE) attention mechanism into the ResNet architecture to construct the Bi-SEResNet model and adopts the Data-Efficient Image Transformer (DeiT) as the final classification layer. Model performance is evaluated using the SPR-Sound dataset, which includes two classification tasks: a 2-category classification of respiratory sound events into Normal and Abnormal, and a 7-category classification involving Normal, Rhonchi, Wheeze, Stridor, Coarse Crackle, Fine Crackle, and Wheeze & Crackle. Performance was assessed using sensitivity (SE), specificity (SP), average score (AS), and harmonic score (HS) as a composite score. The proposed framework achieved scores of 89.26 and 83.63 for 2-category classification and 7-category classification tasks, respectively.
AB - Respiratory sounds serve as early indicators of lung diseases. The development of computer-aided classification systems has become a key enabler for timely diagnosis and treatment. The technology has improved basic services, particularly in resource-limited urban settings. We proposed an advanced hybrid dual-input model tailored for the intelligent classification of respiratory sounds. In this model, we employed Mel-spectrograms and waveform representations as feature extraction methods, utilizing the strengths of multiple modalities to enhance model performance. The classification framework integrates the Squeeze-and-Excitation (SE) attention mechanism into the ResNet architecture to construct the Bi-SEResNet model and adopts the Data-Efficient Image Transformer (DeiT) as the final classification layer. Model performance is evaluated using the SPR-Sound dataset, which includes two classification tasks: a 2-category classification of respiratory sound events into Normal and Abnormal, and a 7-category classification involving Normal, Rhonchi, Wheeze, Stridor, Coarse Crackle, Fine Crackle, and Wheeze & Crackle. Performance was assessed using sensitivity (SE), specificity (SP), average score (AS), and harmonic score (HS) as a composite score. The proposed framework achieved scores of 89.26 and 83.63 for 2-category classification and 7-category classification tasks, respectively.
KW - Bi-SEResNet
KW - DeiT transformer
KW - Respiratory sound
KW - signal processing
UR - http://www.scopus.com/inward/record.url?scp=105004328709&partnerID=8YFLogxK
U2 - 10.1109/ACCESS.2025.3566477
DO - 10.1109/ACCESS.2025.3566477
M3 - Article
AN - SCOPUS:105004328709
SN - 2169-3536
VL - 13
SP - 80971
EP - 80980
JO - IEEE Access
JF - IEEE Access
ER -