TY - GEN
T1 - Approximately detecting duplicates for probabilistic data streams over sliding windows
AU - Wang, Xiujun
AU - Shen, Hong
PY - 2010
Y1 - 2010
N2 - A probabilistic data stream S is defined as a sequence of uncertain tuples < ti, pi >, i = 1..∞, with the semantics that element ti occurs in the stream with probability pi ε (0, 1). Thus each distinct element t, which occurs in tuples of S, has an existential probability based on the tuples: < ti = t, pi >ε S. Existing duplicate detection methods for a traditional deterministic data stream can't maintain these existential probabilities for elements in S, which is important query information. In this paper, we present a novel data structure, Floating Counter Bloom Filter (FCBF), as an extension of CBF [1], which can maintain these existential probabilities effectively. Based on FCBF, we present an efficient algorithm to approximately detect duplicates for probabilistic data streams over sliding windows. Given a sliding window size W and floating counter number N, for any t which occurs in the past sliding window, our method outputs the accurate existential probability of t with probability 1-(1/2) ln(2)*N/W. Our experimental results on the synthetic data verify the effectiveness of our approach.
AB - A probabilistic data stream S is defined as a sequence of uncertain tuples < ti, pi >, i = 1..∞, with the semantics that element ti occurs in the stream with probability pi ε (0, 1). Thus each distinct element t, which occurs in tuples of S, has an existential probability based on the tuples: < ti = t, pi >ε S. Existing duplicate detection methods for a traditional deterministic data stream can't maintain these existential probabilities for elements in S, which is important query information. In this paper, we present a novel data structure, Floating Counter Bloom Filter (FCBF), as an extension of CBF [1], which can maintain these existential probabilities effectively. Based on FCBF, we present an efficient algorithm to approximately detect duplicates for probabilistic data streams over sliding windows. Given a sliding window size W and floating counter number N, for any t which occurs in the past sliding window, our method outputs the accurate existential probability of t with probability 1-(1/2) ln(2)*N/W. Our experimental results on the synthetic data verify the effectiveness of our approach.
KW - Counting bloom filter
KW - Duplicate detection
KW - False positive
KW - Floating counter bloom filter
KW - Probabilistic data stream
UR - http://www.scopus.com/inward/record.url?scp=79952561571&partnerID=8YFLogxK
U2 - 10.1109/PAAP.2010.16
DO - 10.1109/PAAP.2010.16
M3 - Conference contribution
AN - SCOPUS:79952561571
SN - 9780769543123
T3 - Proceedings - 3rd International Symposium on Parallel Architectures, Algorithms and Programming, PAAP 2010
SP - 263
EP - 268
BT - Proceedings - 3rd International Symposium on Parallel Architectures, Algorithms and Programming, PAAP 2010
T2 - 3rd International Symposium on Parallel Architectures, Algorithms and Programming, PAAP 2010
Y2 - 18 December 2010 through 20 December 2010
ER -