TY - JOUR
T1 - PhysVLM
T2 - Vision-Language Model for Generalizable Multitask Remote Physiological Measurement
AU - Gao, Jie
AU - Weng San, Ieong
AU - Luo, Xiangmin
AU - Chen, Zhengxuan
AU - Yu, Zitong
AU - Tan, Tao
AU - Sun, Yue
N1 - Publisher Copyright:
© 2026 IEEE.
PY - 2026
Y1 - 2026
N2 - Remote photoplethysmography (rPPG) offers a noninvasive solution for estimating critical vital signs, such as heart rate (HR) and respiratory rate (RR). To improve algorithmic generalizability, domain generalization (DG) has garnered increasing attention for rPPG. However, existing approaches often overlook the frequency periodicity driven by blood-volume pulsations (BVP) and the dependencies among tasks, which limit generalization. To address this, PhysVLM is proposed as a cross-domain, multitask rPPG framework. First, a pretrained vision language model provides frequency-related priors. These priors are injected into a visual-only student network via knowledge distillation. This encourages the student to focus on physiologically driven periodic features. Next, the wavelet temporal block (WTB) is introduced for multiscale time-frequency decomposition and adaptive filtering to suppress task-irrelevant noise. In addition, frequency-aligned spatial attention (FASA) is proposed to perform frequency-aligned bandpass selection within rhythmically prominent regions. Cross-task consistency constraints are incorporated to reduce spectral leakage. Extensive experiments on five public datasets, together with clinical validation on 25 preterm infants in the neonatal intensive care unit (NICU), show that PhysVLM achieves stable improvements in cross-domain evaluations. On the challenging NBHR dataset, the proposed method reduces HR mean absolute error (MAE) from 5.85 to 2.85 bpm. This framework provides a robust foundation for reliable rPPG applications in clinical settings.
AB - Remote photoplethysmography (rPPG) offers a noninvasive solution for estimating critical vital signs, such as heart rate (HR) and respiratory rate (RR). To improve algorithmic generalizability, domain generalization (DG) has garnered increasing attention for rPPG. However, existing approaches often overlook the frequency periodicity driven by blood-volume pulsations (BVP) and the dependencies among tasks, which limit generalization. To address this, PhysVLM is proposed as a cross-domain, multitask rPPG framework. First, a pretrained vision language model provides frequency-related priors. These priors are injected into a visual-only student network via knowledge distillation. This encourages the student to focus on physiologically driven periodic features. Next, the wavelet temporal block (WTB) is introduced for multiscale time-frequency decomposition and adaptive filtering to suppress task-irrelevant noise. In addition, frequency-aligned spatial attention (FASA) is proposed to perform frequency-aligned bandpass selection within rhythmically prominent regions. Cross-task consistency constraints are incorporated to reduce spectral leakage. Extensive experiments on five public datasets, together with clinical validation on 25 preterm infants in the neonatal intensive care unit (NICU), show that PhysVLM achieves stable improvements in cross-domain evaluations. On the challenging NBHR dataset, the proposed method reduces HR mean absolute error (MAE) from 5.85 to 2.85 bpm. This framework provides a robust foundation for reliable rPPG applications in clinical settings.
KW - Domain generalization (DG)
KW - multitask learning
KW - remote photoplethysmography (rPPG)
KW - vision-language models (VLMs)
UR - https://www.scopus.com/pages/publications/105032764298
U2 - 10.1109/TIM.2026.3671908
DO - 10.1109/TIM.2026.3671908
M3 - Article
AN - SCOPUS:105032764298
SN - 0018-9456
VL - 75
JO - IEEE Transactions on Instrumentation and Measurement
JF - IEEE Transactions on Instrumentation and Measurement
M1 - 5006215
ER -