Skip to main navigation Skip to search Skip to main content

PhysVLM: Vision-language Model for Generalizable Multi-task Remote Physiological Measurement

  • Jie Gao
  • , Ieong Weng San
  • , Xiangmin Luo
  • , Zhengxuan Chen
  • , Zitong Yu
  • , Tao Tan
  • , Yue Sun

Research output: Contribution to journalArticlepeer-review

Abstract

Remote photoplethysmography (rPPG) offers a non-invasive solution for estimating critical vital signs, such as heart rate (HR) and respiratory rate (RR). To improve algorithmic generalizability, domain generalization has garnered increasing attention for rPPG. However, existing approaches often overlook the frequency periodicity driven by blood-volume pulsations and the dependencies among tasks, which limits generalization. To address this, PhysVLM is proposed as a cross-domain, multi-task rPPG framework. First, a pre-trained vision language model provides frequency-related priors. These priors are injected into a visual-only student network via knowledge distillation. This encourages the student to focus on physiologically driven periodic features. Next, the wavelet temporal block (WTB) is introduced for multi-scale time-frequency decomposition and adaptive filtering to suppress task irrelevant noise. In addition, frequency-aligned spatial attention (FASA) is proposed to perform frequency-aligned bandpass selection within rhythmically prominent regions. Cross-task consistency constraints are incorporated to reduce spectral leakage. Extensive experiments on five public datasets, together with clinical validation on 25 preterm infants in the neonatal intensive care unit, show that PhysVLM achieves stable improvements in cross-domain evaluations. On the challenging NBHR dataset, the proposed method reduces HR MAE from 5.85 bpm to 2.85 bpm. This framework provides a robust foundation for reliable rPPG applications in clinical settings.

Original languageEnglish
JournalIEEE Transactions on Instrumentation and Measurement
DOIs
Publication statusAccepted/In press - 2026

Keywords

  • Remote photoplethysmography
  • domain generalization
  • multi-task learning
  • vision-language models

Fingerprint

Dive into the research topics of 'PhysVLM: Vision-language Model for Generalizable Multi-task Remote Physiological Measurement'. Together they form a unique fingerprint.

Cite this