TY - GEN
T1 - CloST
T2 - 21st ACM International Conference on Information and Knowledge Management, CIKM 2012
AU - Tan, Haoyu
AU - Luo, Wuman
AU - Ni, Lionel M.
PY - 2012
Y1 - 2012
N2 - During the past decade, various GPS-equipped devices have generated a tremendous amount of data with time and location information, which we refer to as big spatio-temporal data. In this paper, we present the design and implementation of CloST, a scalable big spatio-temporal data storage system to support data analytics using Hadoop. The main objective of CloST is to avoid scan the whole dataset when a spatio-temporal range is given. To this end, we propose a novel data model which has special treatments on three core attributes including an object id, a location and a time. Based on this data model, CloST hierarchically partitions data using all core attributes which enables efficient parallel processing of spatio-temporal range scans. According to the data characteristics, we devise a compact storage structure which reduces the storage size by an order of magnitude. In addition, we proposes scalable bulk loading algorithms capable of incrementally adding new data into the system. We conduct our experiments using a very large GPS log dataset and the results show that CloST has fast data loading speed, desirable scalability in query processing, as well as high data compression ratio.
AB - During the past decade, various GPS-equipped devices have generated a tremendous amount of data with time and location information, which we refer to as big spatio-temporal data. In this paper, we present the design and implementation of CloST, a scalable big spatio-temporal data storage system to support data analytics using Hadoop. The main objective of CloST is to avoid scan the whole dataset when a spatio-temporal range is given. To this end, we propose a novel data model which has special treatments on three core attributes including an object id, a location and a time. Based on this data model, CloST hierarchically partitions data using all core attributes which enables efficient parallel processing of spatio-temporal range scans. According to the data characteristics, we devise a compact storage structure which reduces the storage size by an order of magnitude. In addition, we proposes scalable bulk loading algorithms capable of incrementally adding new data into the system. We conduct our experiments using a very large GPS log dataset and the results show that CloST has fast data loading speed, desirable scalability in query processing, as well as high data compression ratio.
KW - big data
KW - spatio-temporal data
KW - storage system
UR - http://www.scopus.com/inward/record.url?scp=84871098241&partnerID=8YFLogxK
U2 - 10.1145/2396761.2398589
DO - 10.1145/2396761.2398589
M3 - Conference contribution
AN - SCOPUS:84871098241
SN - 9781450311564
T3 - ACM International Conference Proceeding Series
SP - 2139
EP - 2143
BT - CIKM 2012 - Proceedings of the 21st ACM International Conference on Information and Knowledge Management
Y2 - 29 October 2012 through 2 November 2012
ER -