Reconstructing Speech from Real-Time Articulatory MRI Using Neural Vocoders

Yide Yu, Amin Honarmandi Shandiz, László Tóth

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

8 Citations (Scopus)


Several approaches exist for the recording of articulatory movements, such as eletromagnetic and permanent magnetic articulagraphy, ultrasound tongue imaging and surface electromyography. Although magnetic resonance imaging (MRI) is more costly than the above approaches, the recent developments in this area now allow the recording of real-time MRI videos of the articulators with an acceptable resolution. Here, we experiment with the reconstruction of the speech signal from a real-time MRI recording using deep neural networks. Instead of estimating speech directly, our networks are trained to output a spectral vector, from which we reconstruct the speech signal using the WaveGlow neural vocoder. We compare the performance of three deep neural architectures for the estimation task, combining convolutional (CNN) and recurrence-based (LSTM) neural layers. Besides the mean absolute error (MAE) of our networks, we also evaluate our models by comparing the speech signals obtained using several objective speech quality metrics like the mean cepstral distortion (MCD), Short-Time Objective Intelligibility (STOI), Perceptual Evaluation of Speech Quality (PESQ) and Signal-to-Distortion Ratio (SDR). The results indicate that our approach can successfully reconstruct the gross spectral shape, but more improvements are needed to reproduce the fine spectral details.

Original languageEnglish
Title of host publication29th European Signal Processing Conference, EUSIPCO 2021 - Proceedings
PublisherEuropean Signal Processing Conference, EUSIPCO
Number of pages5
ISBN (Electronic)9789082797060
Publication statusPublished - 2021
Externally publishedYes
Event29th European Signal Processing Conference, EUSIPCO 2021 - Dublin, Ireland
Duration: 23 Aug 202127 Aug 2021

Publication series

NameEuropean Signal Processing Conference
ISSN (Print)2219-5491


Conference29th European Signal Processing Conference, EUSIPCO 2021


  • Articulatory-to-acoustic mapping
  • Deep learning
  • Real-time MRI


Dive into the research topics of 'Reconstructing Speech from Real-Time Articulatory MRI Using Neural Vocoders'. Together they form a unique fingerprint.

Cite this