-
[show abstract]
[hide abstract]
ABSTRACT: We consider the following autocompletion search scenario: imagine a user of a search engine typing a query; then with every
keystroke display those completions of the last query word that would lead to the best hits, and also display the best such
hits. The following problem is at the core of this feature: for a fixed document collection, given a set D of documents, and an alphabetical range W of words, compute the set of all word-in-document pairs (w,d) from the collection such that w ∈W and d∈D. We present a new data structure with the help of which such autocompletion queries can be processed, on the average, in
time linear in the input plus output size, independent of the size of the underlying document collection. At the same time,
our data structure uses no more space than an inverted index. Actual query processing times on a large test collection correlate
almost perfectly with our theoretical bound.
Information Retrieval 07/2008; 11(4):269-286. · 0.91 Impact Factor
-
KI. 01/2008; 22:58-61.
-
Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management, CIKM 2007, Lisbon, Portugal, November 6-10, 2007; 01/2007
-
[show abstract]
[hide abstract]
ABSTRACT: A non-decreasing sequence of n integers is the degree sequence of a 1-tree (i.e., an ordinary tree) on n vertices if and only if there are least two 1’s in the sequence, and the sum of the elements is 2(n–1). We generalize this result in the following ways. First, a natural generalization of this statement is a necessary condition
for k-trees, and we show that it is not sufficient for any k > 1. Second, we identify non-trivial sufficient conditions for the degree sequences of 2-trees. We also show that these sufficient
conditions are almost necessary using bounds on the partition function p(n) and probabilistic methods. Third, we generalize the characterization of degrees of 1-trees in an elegant and counter-intuitive
way to yield integer sequences that characterize k-trees, for all k.
11/2006: pages 216-225;
-
[show abstract]
[hide abstract]
ABSTRACT: We consider the following autocompletion search scenario: imagine a user of a search engine typing a query; then with every
keystroke display those completions of the last query word that would lead to the best hits, and also display the best such
hits. The following problem is at the core of this feature: for a fixed document collection, given a set D of documents, and an alphabetical range W of words, compute the set of all word-in-document pairs (w,d) from the collection such that w ∈W and d∈D. We present a new data structure with the help of which such autocompletion queries can be processed, on the average, in
time linear in the input plus output size, independent of the size of the underlying document collection. At the same time,
our data structure uses no more space than an inverted index. Actual query processing times on a large test collection correlate
almost perfectly with our theoretical bound.
09/2006: pages 150-162;
-
SIGIR 2006: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, Washington, USA, August 6-11, 2006; 01/2006
-
String Processing and Information Retrieval, 13th International Conference, SPIRE 2006, Glasgow, UK, October 11-13, 2006, Proceedings; 01/2006
-
Computing and Combinatorics, 12th Annual International Conference, COCOON 2006, Taipei, Taiwan, August 15-18, 2006, Proceedings; 01/2006
-
[show abstract]
[hide abstract]
ABSTRACT: We point out that for two sets of measurements, it can happen that the average of one set is larger than the average of the
other set on one scale, but becomes smaller after a non-linear monotone transformation of the individual measurements. We
show that the inclusion of error bars is no safeguard against this phenomenon. We give a theorem, however, that limits the
amount of “reversal” that can occur; as a by-product we get two non-standard one-sided tail estimates for arbitrary random
variables which may be of independent interest. Our findings suggest that in the not infrequent situation where more than
one cost measure makes sense, there is no alternative other than to explicitly compare averages for each of them, much unlike
what is common practice.
05/2005: pages 295-304;
-
01/2005: pages 67-76;
-
Experimental and Efficient Algorithms, 4th InternationalWorkshop, WEA 2005, Santorini Island, Greece, May 10-13, 2005, Proceedings; 01/2005
-
4th International Workshop on Efficient and Experimental Algorithms (WEA'05), Springer, 67-76 (2005).
-
Workshop on Challenges in Web Information Retrieval and Integration (WIRI'05), IEEE, 243-248 (2005).
-
[show abstract]
[hide abstract]
ABSTRACT: CompleteSearch is a highly interactive search engine, which, instantly after every single keystroke, offers to the user various kinds of feedback, like promising query completions or refinements by category. We combined CompleteSearch with our institute's helpdesk system and carried out a small user study with some of the staff operating the helpdesk. Participants were asked to process ten typical helpdesk requests, alternatingly using CompleteSearch and the off-the-shelf Google Desktop Search. All participants preferred CompleteSearch over Google Desktop, mainly because of its speed, the feeling of being in power, and the enhanced search facilities.
Gronau, Norbert: 4th Conference on Professional Knowledge Management (WM'07). - Bd. 2, GITO, 101-108 (2007).
-
[show abstract]
[hide abstract]
ABSTRACT: We present an efficient realization of the following interactive search engine feature: as the user is typing the query, words that are related to the last query word and that would lead to good hits are suggested, as well as selected such hits. The realization has three parts: (i) building clusters of related terms, (ii) adding this information as artificial words to the index such that (iii) the described feature reduces to an instance of prefix search and completion. An efficient solution for the latter is provided by the CompleteSearch engine, with which we have integrated the proposed feature. For building the clusters of related terms we propose a variant of latent semantic indexing that, unlike standard approaches, is completely transparent to the user. By experiments on two large test-collections, we demonstrate that the feature is provided at only a slight increase in query processing time and index size.
Silva, Mário J.; Laender, Alberto A. F.; Baeza-Yates, Ricardo; McGuinness, Deborah L.; Olstad, Bjorn; Olsen, Øystein Haug; Falcão, André O.: CIKM'07 : Proceedings of the 2007 ACM Conference on Information and Knowledge Management, ACM, 857-860 (2007).
-
[show abstract]
[hide abstract]
ABSTRACT: We consider the following full-text search autocompletion feature. Imagine a user of a search engine typing a query. Then with every letter being typed, we would like an instant display of completions of the last query word which would lead to good hits. At the same time, the best hits for any of these completions should be displayed. Known indexing data structures that apply to this problem either incur large processing times for a substantial class of queries, or they use a lot of space. We present a new indexing data structure that uses no more space than a state-of-the-art compressed inverted index, but with 10 times faster query processing times. Even on the large TREC Terabyte collection, which comprises over 25 million documents, we achieve, on a single machine and with the index on disk, average response times of one tenth of a second. We have built a full-fledged, interactive search engine that realizes the proposed autocompletion feature combined with support for proximity search, semi-structured (XML) text, subword and phrase completion, and semantic tags.
SIGIR 2006: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, 364-371 (2006).
-
SIGIR'06 Workshop on Faceted Search, ACM, 31-35 (2006).
-
[show abstract]
[hide abstract]
ABSTRACT: A non-decreasing sequence of n integers is the degree sequence of a 1-tree (i.e., an ordinary tree) on n vertices if and only if there are least two 1’s in the sequence, and the sum of the elements is 2(n–1). We generalize this result in the following ways. First, a natural generalization of this statement is a necessary condition for k-trees, and we show that it is not sufficient for any k > 1. Second, we identify non-trivial sufficient conditions for the degree sequences of 2-trees. We also show that these sufficient conditions are almost necessary using bounds on the partition function p(n) and probabilistic methods. Third, we generalize the characterization of degrees of 1-trees in an elegant and counter-intuitive way to yield integer sequences that characterize k-trees, for all k.
Computing and Combinatorics, 12th Annual International Conference, COCOON 2006, Springer, 216-225 (2006).
-
[show abstract]
[hide abstract]
ABSTRACT: We describe CompleteSearch, an interactive search engine that offers the user a variety of complex features, which at first glance have little in common, yet are all provided via one and the same highly optimized core mechanism. This mechanism answers queries for what we call context-sensitive prefix search and completion: given a set of documents and a word range, compute all words from that range which are contained in one of the given documents, as well as those of the given documents which contain a word from the given range. Among the supported features are: (i) automatic query completion, for example, find all completions of the prefix “seman” that occur in the context of the word “ontology”, as well as the best hits for any such completion; (ii) semi-structured (XML) retrieval, for example, find all emailmessages with “dbworld” in the subject line; (iii) semantic search, for example, find all politicians which had a private audience with the pope; (iv) DB-style joins and grouping, for example, find the most prolific authors with at least one paper in both “SIGMOD” and “SIGIR”; and (v) arbitrary combinations of these. The prefix search and completion mechanism of Complete- Search is realized via a novel kind of index data structure, which enables subsecond query processing times for collections up to a terabyte of data, on a single PC. We report on a number of lessons learned in the process of building the system and on our experience with a number of publicly used deployments.
CIDR 2007 : 3rd Biennial Conference on Innovative Data Systems Research, University of Wisconsin / Computer Science Department, 88-95 (2007).
-
[show abstract]
[hide abstract]
ABSTRACT: We consider the following autocompletion search scenario: imagine a user of a search engine typing a query; then with every keystroke display those completions of the last query word that would lead to the best hits, and also display the best such hits. The following problem is at the core of this feature: for a fixed document collection, given a set D of documents, and an alphabetical range W of words, compute the set of all word-in-document pairs (w ,d) from the collection such that w ∈W and d∈D. We present a new data structure with the help of which such autocompletion queries can be processed, on the average, in time linear in the input plus output size, independent of the size of the underlying document collection. At the same time, our data structure uses no more space than an inverted index. Actual query processing times on a large test collection correlate almost perfectly with our theoretical bound.
String Processing and Information Retrieval : 13th International Conference, SPIRE 2006, Springer, 150-162 (2006).