Privacy-preserving internet traffic publication

Longkun Guo, Hong Shen

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Citations (Scopus)

Abstract

As machine learning (ML)-based traffic classification develops, Internet traffic data is published in public to serve as test data. Although the IP addresses therein are anonymized, it is given explicitly which data belongs to an identical user. Then using the information, an adversary can identify a user from the anonymized users. The paper first gives a k-anonymity method to reduce the probability of information leak to P/k, where P is the probability of information leak without k-anonymity. Assume the number of the flows belonging to an IP address follows Normal distribution, the information loss is shown μ2+σ2/kμ2+σ2, where μ and σ are respectively the mean and the variance of the Normal distribution. Later, random noise is added to further reduce the probability of information leak to P/k2, with an expected distortion rate of approximately 2d+log k-log|X|, where d is the number of dimensions and |X| is the number of the vectors. At last, real-world Internet traffic data is used to evaluate the utility of the anonymized traffic data. According to the experimental results, the k-anonymized noised data can be clustered with an overall accuracy rate close to the state-of-the-art results for non-anonymized traffic data.

Original languageEnglish
Title of host publicationProceedings - 15th IEEE International Conference on Trust, Security and Privacy in Computing and Communications, 10th IEEE International Conference on Big Data Science and Engineering and 14th IEEE International Symposium on Parallel and Distributed Processing with Applications, IEEE TrustCom/BigDataSE/ISPA 2016
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages884-891
Number of pages8
ISBN (Electronic)9781509032051
DOIs
Publication statusPublished - 2016
Externally publishedYes
EventJoint 15th IEEE International Conference on Trust, Security and Privacy in Computing and Communications, 10th IEEE International Conference on Big Data Science and Engineering and 14th IEEE International Symposium on Parallel and Distributed Processing with Applications, IEEE TrustCom/BigDataSE/ISPA 2016 - Tianjin, China
Duration: 23 Aug 201626 Aug 2016

Publication series

NameProceedings - 15th IEEE International Conference on Trust, Security and Privacy in Computing and Communications, 10th IEEE International Conference on Big Data Science and Engineering and 14th IEEE International Symposium on Parallel and Distributed Processing with Applications, IEEE TrustCom/BigDataSE/ISPA 2016

Conference

ConferenceJoint 15th IEEE International Conference on Trust, Security and Privacy in Computing and Communications, 10th IEEE International Conference on Big Data Science and Engineering and 14th IEEE International Symposium on Parallel and Distributed Processing with Applications, IEEE TrustCom/BigDataSE/ISPA 2016
Country/TerritoryChina
CityTianjin
Period23/08/1626/08/16

Keywords

  • Clustering
  • K-anonymity
  • Privacy preserving
  • Traffic classification

Fingerprint

Dive into the research topics of 'Privacy-preserving internet traffic publication'. Together they form a unique fingerprint.

Cite this