Approximately detecting duplicates for probabilistic data streams over sliding windows

Xiujun Wang, Hong Shen

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Citations (Scopus)

Abstract

A probabilistic data stream S is defined as a sequence of uncertain tuples < ti, pi >, i = 1..∞, with the semantics that element ti occurs in the stream with probability pi ε (0, 1). Thus each distinct element t, which occurs in tuples of S, has an existential probability based on the tuples: < ti = t, pi >ε S. Existing duplicate detection methods for a traditional deterministic data stream can't maintain these existential probabilities for elements in S, which is important query information. In this paper, we present a novel data structure, Floating Counter Bloom Filter (FCBF), as an extension of CBF [1], which can maintain these existential probabilities effectively. Based on FCBF, we present an efficient algorithm to approximately detect duplicates for probabilistic data streams over sliding windows. Given a sliding window size W and floating counter number N, for any t which occurs in the past sliding window, our method outputs the accurate existential probability of t with probability 1-(1/2) ln(2)*N/W. Our experimental results on the synthetic data verify the effectiveness of our approach.

Original languageEnglish
Title of host publicationProceedings - 3rd International Symposium on Parallel Architectures, Algorithms and Programming, PAAP 2010
Pages263-268
Number of pages6
DOIs
Publication statusPublished - 2010
Externally publishedYes
Event3rd International Symposium on Parallel Architectures, Algorithms and Programming, PAAP 2010 - Dalian, China
Duration: 18 Dec 201020 Dec 2010

Publication series

NameProceedings - 3rd International Symposium on Parallel Architectures, Algorithms and Programming, PAAP 2010

Conference

Conference3rd International Symposium on Parallel Architectures, Algorithms and Programming, PAAP 2010
Country/TerritoryChina
CityDalian
Period18/12/1020/12/10

Keywords

  • Counting bloom filter
  • Duplicate detection
  • False positive
  • Floating counter bloom filter
  • Probabilistic data stream

Fingerprint

Dive into the research topics of 'Approximately detecting duplicates for probabilistic data streams over sliding windows'. Together they form a unique fingerprint.

Cite this