Quick-MIMIC: A Multimodal Data Extraction Pipeline for MIMIC with Parallelization

  • Yutao Dou
  • , Wei Li
  • , Yangtao Zheng
  • , Xiaojun Yao
  • , Huanxiang Liu
  • , Albert Y. Zomaya
  • , Shaoliang Peng

Research output: Contribution to journalArticlepeer-review

2 Citations (Scopus)

Abstract

Medical big data with artificial intelligence are vital in advancing digital medicine. However, the opaque and non-standardised nature embedded in most medical data extraction is prone to batch effects and has become a significant obstacle to reproducing previous works. This paper aims to develop an easy-to-use time-series multimodal data extraction pipeline, Quick-MIMIC, for standardised data extraction from MIMIC datasets. Our method can fully integrate different data structures into a time-series table, including structured, semi-structured, and unstructured data. We also introduce two additional modules to Quick-MIMIC, a pipeline parallelization method and data analysis methods, for reducing the data extraction time and presenting the characteristics of the extracted data intuitively. The extensive experimental results show that our pipeline can efficiently extract the needed data from the MIMIC dataset and convert it into the correct format for further analytic tasks.

Original languageEnglish
Pages (from-to)1333-1346
Number of pages14
JournalBig Data Mining and Analytics
Volume7
Issue number4
DOIs
Publication statusPublished - Dec 2024

Keywords

  • MIMIC dataset
  • data extraction pipeline
  • data integration

Fingerprint

Dive into the research topics of 'Quick-MIMIC: A Multimodal Data Extraction Pipeline for MIMIC with Parallelization'. Together they form a unique fingerprint.

Cite this