跳至主導覽 跳至搜尋 跳過主要內容

PhysVLM: Vision-Language Model for Generalizable Multitask Remote Physiological Measurement

  • Jie Gao
  • , Ieong Weng San
  • , Xiangmin Luo
  • , Zhengxuan Chen
  • , Zitong Yu
  • , Tao Tan
  • , Yue Sun
  • Macao Polytechnic University
  • Kiang Wu Hospital
  • Great Bay University
  • Dongguan Key Laboratory for Intelligence and Information Technology

研究成果: Article同行評審

摘要

Remote photoplethysmography (rPPG) offers a noninvasive solution for estimating critical vital signs, such as heart rate (HR) and respiratory rate (RR). To improve algorithmic generalizability, domain generalization (DG) has garnered increasing attention for rPPG. However, existing approaches often overlook the frequency periodicity driven by blood-volume pulsations (BVP) and the dependencies among tasks, which limit generalization. To address this, PhysVLM is proposed as a cross-domain, multitask rPPG framework. First, a pretrained vision language model provides frequency-related priors. These priors are injected into a visual-only student network via knowledge distillation. This encourages the student to focus on physiologically driven periodic features. Next, the wavelet temporal block (WTB) is introduced for multiscale time-frequency decomposition and adaptive filtering to suppress task-irrelevant noise. In addition, frequency-aligned spatial attention (FASA) is proposed to perform frequency-aligned bandpass selection within rhythmically prominent regions. Cross-task consistency constraints are incorporated to reduce spectral leakage. Extensive experiments on five public datasets, together with clinical validation on 25 preterm infants in the neonatal intensive care unit (NICU), show that PhysVLM achieves stable improvements in cross-domain evaluations. On the challenging NBHR dataset, the proposed method reduces HR mean absolute error (MAE) from 5.85 to 2.85 bpm. This framework provides a robust foundation for reliable rPPG applications in clinical settings.

原文English
文章編號5006215
期刊IEEE Transactions on Instrumentation and Measurement
75
DOIs
出版狀態Published - 2026

指紋

深入研究「PhysVLM: Vision-Language Model for Generalizable Multitask Remote Physiological Measurement」主題。共同形成了獨特的指紋。

引用此