A Methodology for the Automatic Creation of Massive Continuous Query Datasets from Real Life Corpora

Christos Tryfonopoulos, University of the Peloponnese, Greece; Christos Tryfonopoulos, University of the Peloponnese, Greece

A Methodology for the Automatic Creation of Massive Continuous Query Datasets from Real Life Corpora

Authors

Christos Tryfonopoulos, University of the Peloponnese, Greece

Abstract

In the information filtering (or publish/ subscribe) paradigm, clients subscribe to a server with continuous queries that express their information needs while information sources publish documents to servers. Whenever a document is published, the continuous queries satisfying this document are found and notifications are sent to appropriate subscribed clients. Although information filtering has been in the research agenda for about half a century, there is a huge paradox when it comes to benchmarking the performance of such systems. There is a striking lack of a benchmarking mechanism (in the form of a large-scale standarised test collection of continuous queries and the relevant document publications) specifically created for evaluating filtering tasks. This work aims at filling this gap by proposing a methodology for automatically creating massive continuous query datasets from available document collections. We intend to publicly release all related material (including the software accompanying the proposed methodology) to the research community after publication.

Keywords

Continuous queries; dataset construction; information filtering; publish/subscribe; information dissemination; profiles;

CS&IT Conference Proceedings

A Methodology for the Automatic Creation of Massive Continuous Query Datasets from Real Life Corpora