摘要
Data lakes are typically large data repositories where enterprises store data in a variety of data formats. From the perspective of data storage, data can be categorized into structured, semi-structured, and unstructured data. On the one hand, due to the complexity of data forms and transformation procedures, many enterprises simply pour valuable data into data lakes without organizing and managing them effectively. This can create data silos (or data islands) or even data swamps, with the result that some data will be permanently invisible. Although data are integrated into a data lake, they are simply physically stored in the same environment and cannot be correlated with other data to leverage their precious value. On the other hand, processing data from a data lake into a desired format is always a difficult and tedious task that requires experienced programming skills, such as conversion from structured to semi-structured. In this article, a novel software framework called Java Annotation for Manipulating Data Lakes (JAMDL) that can manage heterogeneous data is proposed. This approach uses Java annotations to express the properties of data in metadata (data about data) so that the data can be converted into different formats and managed efficiently in a data lake. Furthermore, this article suggests using artificial intelligence (AI) translation models to generate Data Manipulation Language (DML) operations for data manipulation and uses AI recommendation models to improve the visibility of data when data precipitation occurs.
原文 | English |
---|---|
頁(從 - 到) | 34903-34917 |
頁數 | 15 |
期刊 | IEEE Access |
卷 | 12 |
DOIs | |
出版狀態 | Published - 2024 |
指紋
深入研究「Manipulating Data Lakes Intelligently with Java Annotations」主題。共同形成了獨特的指紋。新聞/媒體
-
Faculty of Applied Sciences Researchers Update Current Study Findings on Data Storage (Manipulating Data Lakes Intelligently With Java Annotations)
WEI KE, SIO KEI IM, LAP MAN HOI & LAP MAN HOI
22/03/24
1 的項目 媒體報導
新聞/媒體: Press/Media