[Show abstract][Hide abstract] ABSTRACT: A Search Support Engine (SSE) is implemented based on the basic principles of Information Retrieval Support Systems (IRSS) and Information Seeking Support Systems (ISSS). An SSE aims at meeting the diversity needs from different users, providing various supporting functionalities, tools, etc. for users to perform various tasks beyond the traditional search and browsing provided by current search engines. As an illustrative example, we developed a DBLP search support engine (DBLP-SSE), and we discuss some concrete supporting functionalities, namely, search refinement support, domain analysis support, etc. Each of the functionality focus on a unique perspective supporting users finding useful information and knowledge from the DBLP dataset. The search support engine can be considered as a step towards Knowledge Retrieval (KR) and Web Intelligence (WI).
2009 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2009, Milan, Italy, 15-18 September 2009, Main Conference Proceedings; 01/2009
[Show abstract][Hide abstract] ABSTRACT: We present the algorithmic core of a full text data base that allows fast Boolean queries, phrase queries, and document reporting using less space than the input text. The system uses a carefully choreographed combination of classical data compression techniques and inverted index based search data structures. It outperforms suffix array based techniques for all the above operations for real world (natural language) texts. Searching in large text data bases has become a key application of computers. Traditionally (a part of) the index data structure is held in main memory while the texts themselves are held on disks. However, this limits the speed of text access. Therefore, with ever increasing RAM capacities, there is now considerable interest in data bases that keep everything in main memory. For example, the TREX engine of SAP stores large quantities of small texts (like memos or product descriptions) and requires rapid access to all the data. Such a search engine consists of a cluster of multi- core machines, where each processing core is assigned an about equal share of the data base. Since RAM is hundreds of times more expensive than disks, good data compression is very important for in-memory data bases. In recent years, the algorithms community has developed sophisticated data structures based on suffix arrays that provide asymptotically efficient search using very little space that can even be less than the text itself. This paper studies a different approach to com- pressed in-memory text search engines based on inverted indexes. We show that with careful data compression we can use very little space. The system consists of several interacting parts that support Boolean queries, phrase queries, and document reporting. It turns out that the parts interact in a nontrivial way. For example, the in- dex data structure introduced in (1) turns out to allow better data compression as a side effect and can be used
Proceedings of the Workshop on Algorithm Engineering and Experiments, ALENEX 2008, San Francisco, California, USA, January 19, 2008; 01/2008
[Show abstract][Hide abstract] ABSTRACT: We consider the following autocompletion search scenario: imagine a user of a search engine typing a query; then with every
keystroke display those completions of the last query word that would lead to the best hits, and also display the best such
hits. The following problem is at the core of this feature: for a fixed document collection, given a set D of documents, and an alphabetical range W of words, compute the set of all word-in-document pairs (w,d) from the collection such that w ∈W and d∈D. We present a new data structure with the help of which such autocompletion queries can be processed, on the average, in
time linear in the input plus output size, independent of the size of the underlying document collection. At the same time,
our data structure uses no more space than an inverted index. Actual query processing times on a large test collection correlate
almost perfectly with our theoretical bound.
Data provided are for informational purposes only. Although carefully collected, accuracy cannot be guaranteed. The impact factor represents a rough estimation of the journal's impact factor and does not reflect the actual current impact factor. Publisher conditions are provided by RoMEO. Differing provisions from the publisher's actual policy or licence agreement may be applicable.