跳至主導覽 跳至搜尋 跳過主要內容

Approximately detecting duplicates for probabilistic data streams over sliding windows

  • Xiujun Wang
  • , Hong Shen

研究成果: Conference contribution同行評審

4 引文 斯高帕斯(Scopus)

摘要

A probabilistic data stream S is defined as a sequence of uncertain tuples < ti, pi >, i = 1..∞, with the semantics that element ti occurs in the stream with probability pi ε (0, 1). Thus each distinct element t, which occurs in tuples of S, has an existential probability based on the tuples: < ti = t, pi >ε S. Existing duplicate detection methods for a traditional deterministic data stream can't maintain these existential probabilities for elements in S, which is important query information. In this paper, we present a novel data structure, Floating Counter Bloom Filter (FCBF), as an extension of CBF [1], which can maintain these existential probabilities effectively. Based on FCBF, we present an efficient algorithm to approximately detect duplicates for probabilistic data streams over sliding windows. Given a sliding window size W and floating counter number N, for any t which occurs in the past sliding window, our method outputs the accurate existential probability of t with probability 1-(1/2) ln(2)*N/W. Our experimental results on the synthetic data verify the effectiveness of our approach.

原文English
主出版物標題Proceedings - 3rd International Symposium on Parallel Architectures, Algorithms and Programming, PAAP 2010
頁面263-268
頁數6
DOIs
出版狀態Published - 2010
對外發佈
事件3rd International Symposium on Parallel Architectures, Algorithms and Programming, PAAP 2010 - Dalian, China
持續時間: 18 12月 201020 12月 2010

出版系列

名字Proceedings - 3rd International Symposium on Parallel Architectures, Algorithms and Programming, PAAP 2010

Conference

Conference3rd International Symposium on Parallel Architectures, Algorithms and Programming, PAAP 2010
國家/地區China
城市Dalian
期間18/12/1020/12/10

指紋

深入研究「Approximately detecting duplicates for probabilistic data streams over sliding windows」主題。共同形成了獨特的指紋。

引用此