Skip to main navigation Skip to search Skip to main content

PhysVLM: Vision-Language Model for Generalizable Multitask Remote Physiological Measurement

  • Jie Gao
  • , Ieong Weng San
  • , Xiangmin Luo
  • , Zhengxuan Chen
  • , Zitong Yu
  • , Tao Tan
  • , Yue Sun
  • Macao Polytechnic University
  • Kiang Wu Hospital
  • Great Bay University
  • Dongguan Key Laboratory for Intelligence and Information Technology

Research output: Contribution to journalArticlepeer-review

Abstract

Remote photoplethysmography (rPPG) offers a noninvasive solution for estimating critical vital signs, such as heart rate (HR) and respiratory rate (RR). To improve algorithmic generalizability, domain generalization (DG) has garnered increasing attention for rPPG. However, existing approaches often overlook the frequency periodicity driven by blood-volume pulsations (BVP) and the dependencies among tasks, which limit generalization. To address this, PhysVLM is proposed as a cross-domain, multitask rPPG framework. First, a pretrained vision language model provides frequency-related priors. These priors are injected into a visual-only student network via knowledge distillation. This encourages the student to focus on physiologically driven periodic features. Next, the wavelet temporal block (WTB) is introduced for multiscale time-frequency decomposition and adaptive filtering to suppress task-irrelevant noise. In addition, frequency-aligned spatial attention (FASA) is proposed to perform frequency-aligned bandpass selection within rhythmically prominent regions. Cross-task consistency constraints are incorporated to reduce spectral leakage. Extensive experiments on five public datasets, together with clinical validation on 25 preterm infants in the neonatal intensive care unit (NICU), show that PhysVLM achieves stable improvements in cross-domain evaluations. On the challenging NBHR dataset, the proposed method reduces HR mean absolute error (MAE) from 5.85 to 2.85 bpm. This framework provides a robust foundation for reliable rPPG applications in clinical settings.

Original languageEnglish
Article number5006215
JournalIEEE Transactions on Instrumentation and Measurement
Volume75
DOIs
Publication statusPublished - 2026

Keywords

  • Domain generalization (DG)
  • multitask learning
  • remote photoplethysmography (rPPG)
  • vision-language models (VLMs)

Fingerprint

Dive into the research topics of 'PhysVLM: Vision-Language Model for Generalizable Multitask Remote Physiological Measurement'. Together they form a unique fingerprint.

Cite this