TY - JOUR
T1 - Dissimilarity-constrained node attribute coverage diversification for novelty-enhanced top-k search in large attributed networks
AU - Meng, Zaiqiao
AU - Shen, Hong
N1 - Publisher Copyright:
© 2018 Elsevier B.V.
PY - 2018/6/15
Y1 - 2018/6/15
N2 - Query diversification as an effective way to reduce query ambiguity and enhance result novelty has received much attention in top-k search applications on large networks. A major drawback of the existing diversification models is that they do not consider redundancy elimination during the course of search, resulting in unassured novelty in the search result. In this paper, to improve the novelty of the search result, we propose a new method of diversified top-k similarity search by combining diversification of node attribute coverage with a dissimilarity constraint. Due to the non-monotonicity implied by the dissimilarity constraint, existing techniques based on monotonicity assumptions cannot be applied. Our model requires solving a new problem of Dissimilarity Constrained Non-monotone Submodular Maximization (DCNSM). Based on constructing a dissimilarity-based graph, we solve this problem by a greedy algorithm achieving an approximation ratio of 1/Δ where Δ is the maximum node degree of the dissimilarity-based graph, in time linear to the number of edges of the graph. We show that DCNSM cannot be approximated in ratio |V|1−ϵ, indicating that our solution achieves an optimal ratio. We conduct extensive experiments on both synthetic and real-world attributed network datasets. The results show that our diversification model significantly outperforms the baseline methods, and confirm that combining dissimilarity constraint in diversification can significantly improve the novelty of search result.
AB - Query diversification as an effective way to reduce query ambiguity and enhance result novelty has received much attention in top-k search applications on large networks. A major drawback of the existing diversification models is that they do not consider redundancy elimination during the course of search, resulting in unassured novelty in the search result. In this paper, to improve the novelty of the search result, we propose a new method of diversified top-k similarity search by combining diversification of node attribute coverage with a dissimilarity constraint. Due to the non-monotonicity implied by the dissimilarity constraint, existing techniques based on monotonicity assumptions cannot be applied. Our model requires solving a new problem of Dissimilarity Constrained Non-monotone Submodular Maximization (DCNSM). Based on constructing a dissimilarity-based graph, we solve this problem by a greedy algorithm achieving an approximation ratio of 1/Δ where Δ is the maximum node degree of the dissimilarity-based graph, in time linear to the number of edges of the graph. We show that DCNSM cannot be approximated in ratio |V|1−ϵ, indicating that our solution achieves an optimal ratio. We conduct extensive experiments on both synthetic and real-world attributed network datasets. The results show that our diversification model significantly outperforms the baseline methods, and confirm that combining dissimilarity constraint in diversification can significantly improve the novelty of search result.
KW - Information retrieval
KW - Query diversification
KW - Submodular maximization
KW - Top-k search
UR - http://www.scopus.com/inward/record.url?scp=85044924476&partnerID=8YFLogxK
U2 - 10.1016/j.knosys.2018.03.008
DO - 10.1016/j.knosys.2018.03.008
M3 - Article
AN - SCOPUS:85044924476
SN - 0950-7051
VL - 150
SP - 85
EP - 94
JO - Knowledge-Based Systems
JF - Knowledge-Based Systems
ER -