Manipulating Data Lakes Intelligently with Java Annotations

Lap Man Hoi, Wei Ke, Sio Kei Im

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)

Abstract

Data lakes are typically large data repositories where enterprises store data in a variety of data formats. From the perspective of data storage, data can be categorized into structured, semi-structured, and unstructured data. On the one hand, due to the complexity of data forms and transformation procedures, many enterprises simply pour valuable data into data lakes without organizing and managing them effectively. This can create data silos (or data islands) or even data swamps, with the result that some data will be permanently invisible. Although data are integrated into a data lake, they are simply physically stored in the same environment and cannot be correlated with other data to leverage their precious value. On the other hand, processing data from a data lake into a desired format is always a difficult and tedious task that requires experienced programming skills, such as conversion from structured to semi-structured. In this article, a novel software framework called Java Annotation for Manipulating Data Lakes (JAMDL) that can manage heterogeneous data is proposed. This approach uses Java annotations to express the properties of data in metadata (data about data) so that the data can be converted into different formats and managed efficiently in a data lake. Furthermore, this article suggests using artificial intelligence (AI) translation models to generate Data Manipulation Language (DML) operations for data manipulation and uses AI recommendation models to improve the visibility of data when data precipitation occurs.

Original languageEnglish
Pages (from-to)34903-34917
Number of pages15
JournalIEEE Access
Volume12
DOIs
Publication statusPublished - 2024

Keywords

  • Data lake
  • JAMDL
  • ORMapping
  • data precipitation
  • data stewards
  • enterprise-level applications
  • impedance mismatch
  • java annotations
  • object-oriented
  • software framework

Fingerprint

Dive into the research topics of 'Manipulating Data Lakes Intelligently with Java Annotations'. Together they form a unique fingerprint.

Cite this