## Abstract

As machine learning (ML)-based traffic classification develops, Internet traffic data is published in public to serve as test data. Although the IP addresses therein are anonymized, it is given explicitly which data belongs to an identical user. Then using the information, an adversary can identify a user from the anonymized users. The paper first gives a k-anonymity method to reduce the probability of information leak to P/k, where P is the probability of information leak without k-anonymity. Assume the number of the flows belonging to an IP address follows Normal distribution, the information loss is shown μ2+σ2/kμ2+σ2, where μ and σ are respectively the mean and the variance of the Normal distribution. Later, random noise is added to further reduce the probability of information leak to P/k2, with an expected distortion rate of approximately 2d+log k-log|X|, where d is the number of dimensions and |X| is the number of the vectors. At last, real-world Internet traffic data is used to evaluate the utility of the anonymized traffic data. According to the experimental results, the k-anonymized noised data can be clustered with an overall accuracy rate close to the state-of-the-art results for non-anonymized traffic data.

Original language | English |
---|---|

Title of host publication | Proceedings - 15th IEEE International Conference on Trust, Security and Privacy in Computing and Communications, 10th IEEE International Conference on Big Data Science and Engineering and 14th IEEE International Symposium on Parallel and Distributed Processing with Applications, IEEE TrustCom/BigDataSE/ISPA 2016 |

Publisher | Institute of Electrical and Electronics Engineers Inc. |

Pages | 884-891 |

Number of pages | 8 |

ISBN (Electronic) | 9781509032051 |

DOIs | |

Publication status | Published - 2016 |

Externally published | Yes |

Event | Joint 15th IEEE International Conference on Trust, Security and Privacy in Computing and Communications, 10th IEEE International Conference on Big Data Science and Engineering and 14th IEEE International Symposium on Parallel and Distributed Processing with Applications, IEEE TrustCom/BigDataSE/ISPA 2016 - Tianjin, China Duration: 23 Aug 2016 → 26 Aug 2016 |

### Publication series

Name | Proceedings - 15th IEEE International Conference on Trust, Security and Privacy in Computing and Communications, 10th IEEE International Conference on Big Data Science and Engineering and 14th IEEE International Symposium on Parallel and Distributed Processing with Applications, IEEE TrustCom/BigDataSE/ISPA 2016 |
---|

### Conference

Conference | Joint 15th IEEE International Conference on Trust, Security and Privacy in Computing and Communications, 10th IEEE International Conference on Big Data Science and Engineering and 14th IEEE International Symposium on Parallel and Distributed Processing with Applications, IEEE TrustCom/BigDataSE/ISPA 2016 |
---|---|

Country/Territory | China |

City | Tianjin |

Period | 23/08/16 → 26/08/16 |

## Keywords

- Clustering
- K-anonymity
- Privacy preserving
- Traffic classification