Figure 1 - uploaded by Stéphane Gançarski
Content may be subject to copyright.
Test results. The rows from top to bottom show graphs for flat, linear and exponential weighting respectively. The three leftmost columns show graphs for feature selection based on weighted contingencies. The three columns at the right side show result graphs for feature selection based on weighted 2 's. The first and fourth column have 50, second and fifth 100 and third an sixth column 200 examples per time bucket. The x-axis displays the number of buckets taken into account. The y-axis lists the number of examples classified correctly. Results are by displayed the amount of documents per bucket (2, 5, 10, 100). Cf. Section 4.1.
Source publication
Temporal queries combine textual and temporal constraints and are used for searching temporal collections like web archives. Indexing these collections might result in huge index files which can reduce the performance of query processing. Static index pruning can be used to increase efficiency. It remains unclear whether these methods are adapted f...
Similar publications
Multiple approaches to grab comparable data from the Web have been developed up to date. Nevertheless, coming out with a high-quality comparable corpus of a specific topic is not straightforward. We present a model for the automatic extraction of comparable texts in multiple languages and on specific topics from Wikipedia. In order to prove the val...
In this paper, we describe the construction of TeKnowbase, a knowledge-base of technical concepts in computer science. Our main information sources are technical websites such as Webopedia and Techtarget as well as Wikipedia and online textbooks. We divide the knowledge-base construction problem into two parts -- the acquisition of entities and the...
Mined Semantic Analysis (MSA) is a novel distributional semantics approach
which employs data mining techniques. MSA embraces knowledge-driven analysis of
natural languages. It uncovers implicit relations between concepts by mining
for their associations in target encyclopedic corpora. MSA exploits not only
target corpus content but also its knowle...
Finding experts for a given problem is recognized as a difficult task. Even
when a taxonomy of subject expertise exists, and is associated with a group of
experts, it can be hard to exploit by users who have not internalized the
taxonomy. Here we present a method for both attaching experts to a domain
ontology, and hiding this fact from the end use...
Tree-structured data naturally appear in various fields, particularly in biology where plants and blood vessels may be described by trees, but also in computer science because XML documents form a tree structure. This paper is devoted to the estimation of the relative scale of ordered trees that share the same layout. The theoretical study is achie...