Search result diversification in short text streams

Shangsong Liang, Emine Yilmaz, Hong Shen, Maarten De Rijke, W. Bruce Croft

Research output: Contribution to journalArticlepeer-review

21 Citations (Scopus)

Abstract

We consider the problem of search result diversification for streams of short texts. Diversifying search results in short text streams is more challenging than in the case of long documents, as it is difficult to capture the latent topics of short documents. To capture the changes of topics and the probabilities of documents for a given query at a specific time in a short text stream, we propose a dynamic Dirichlet multinomial mixture topic model, called D2M3, as well as a Gibbs sampling algorithm for the inference. We also propose a streaming diversification algorithm, SDA, that integrates the information captured by D2M3 with our proposed modified version of the PM-2 (Proportionality-based diversification Method - second version) diversification algorithm. We conduct experiments on a Twitter dataset and find that SDA statistically significantly outperforms state-of-the-art non-streaming retrieval methods, plain streaming retrieval methods, as well as streaming diversification methods that use other dynamic topic models.

Original languageEnglish
Article number8
JournalACM Transactions on Information Systems
Volume36
Issue number1
DOIs
Publication statusPublished - Apr 2017
Externally publishedYes

Keywords

  • Ad hoc retrieval
  • Data streams
  • Diversity

Fingerprint

Dive into the research topics of 'Search result diversification in short text streams'. Together they form a unique fingerprint.

Cite this