Yehoshua Sagiv's research while affiliated with Hebrew University of Jerusalem and other places

Publications (193)

Article
Full-text available
This article investigates the problem of geosocial similarity among users of online social networks, based on the locations of their activities (e.g., posting messages or photographs). Finding pairs of geosocially similar users or detecting that two sets of locations (of activities) belong to the same user has important applications in privacy prot...
Conference Paper
Data graphs are convenient for supporting keyword search that takes into account available semantic structure and not just textual relevance. However, the problem of constructing data graphs that facilitate both efficiency and effectiveness of the underlying system has hardly been addressed. A conceptual model for this task is proposed. Principles...
Article
A data graph is a convenient paradigm for supporting keyword search that takes into account available semantic structure and not just textual relevance. However, the problem of constructing data graphs that facilitate both efficiency and effectiveness of the underlying system has hardly been addressed. A conceptual model for this task is proposed....
Article
Full-text available
In keyword search over a data graph, an answer is a non-redundant subtree that contains all the keywords of the query. A naive approach to producing all the answers by increasing height is to generalize Dijkstra's algorithm to enumerating all acyclic paths by increasing weight. The idea of freezing is introduced so that (most) non-shortest paths ar...
Conference Paper
Full-text available
Earthquakes are sudden, cause huge damage to extensive areas, and may negatively affect the lives of millions of people. Thus, it is crucial to develop a system that can help people survive an earthquake and recover from its aftereffects. In this paper we present a vision of a smartphone app, called EAGA (earthquake alerter and guidance app), that...
Article
This demo presents exploratory keyword search over data graphs by means of semantic facets. The demo starts with a keyword search over data graphs. Answers are first ranked by an existing search engine that considers their textual relevance and semantic structure. The user can then explore the answers through facets of structural patterns (i.e., sc...
Article
Theoretical and practical issues pertaining to keyword search over data graphs are discussed. A formal model and algorithms for enumerating answers (by operating directly on the data graph) are described. Various aspects of a system are explained, including the object-connector-property data model, how it is used to construct a data graph from an X...
Article
The task of formulating queries is greatly facilitated when they can be generated automatically from some given data values, schema concepts or both (e.g., names of particular entities and XML tags). This automation is the basis of various database applications, such as keyword search and interactive query formulation. Usually, automatic query gene...
Patent
Routing method for computing routes over uncertain geo-spatial data whereby only upon visiting the geographic entities it can be determined whether the needed service or product is actually provided and is adequate. When dealing with uncertain data, the returned route may need to go via several entities of the same type. Another routing method cons...
Conference Paper
Full-text available
Many smartphones, nowadays, use GPS to detect the location of the user, and can use the Internet to interact with remote location-based services. These two capabilities support online navigation that incorporates search. In this demo we presents WISER---a system for Web-based Interactive Search en Route. In the system, users perform route search by...
Conference Paper
In keyword search over data graphs, an answer is a non-redundant subtree that includes the given keywords. This paper focuses on improving the effectiveness of that type of search. A novel approach that combines language models with structural relevance is described. The proposed approach consists of three steps. First, language models are used to...
Article
Full-text available
In integration of road maps modeled as road vector data, the main task is matching pairs of objects that represent, in different maps, the same segment of a real-world road. In an ad hoc integration, the matching is done for a specific need and, thus, is performed in real time, where only a limited preprocessing is possible. Usually, ad hoc integra...
Article
Lawler-Murty’s procedure is a general tool for designing algorithms for enumeration problems (i.e., problems that involve the production of a large set of answers in ranked order), which naturally arise in database management. Lawler-Murty’s procedure is used in a variety of modern database applications; particularly in those related to keyword sea...
Conference Paper
Large knowledge bases, the Linked Data cloud, and Web 2.0 communities open up new opportunities for deep question answering to support the advanced information needs of knowledge workers like students, journalists, or business analysts. This calls for going beyond keyword search, towards more expressive ways of entity-relationship-oriented querying...
Conference Paper
Tools that automatically generate queries are useful when schemas are hard to understand due to size or complexity. Usually, these tools find minimal tree patterns that contain a given set (or bag) of labels. The labels could be, for example, XML tags or relation names. The only restriction is that, in a tree pattern, adjacent labels must be among...
Conference Paper
The problem of fully decentralized search over many collections is considered. The objective is to approximate the results of centralized search (namely, using a central index) while controlling the communication cost and involving only a small number of collections. The proposed solution is couched in a peer-to-peer (P2P) network, but can also be...
Article
Full-text available
A route search is an enhancement of an ordinary geographic search. Instead of merely returning a set of entities, the result is a route that goes via entities that are relevant to the search. The input to the problem consists of several search queries, and each query defines a type of geographical entities. When visited, some of the entities succee...
Conference Paper
Full-text available
A system for keyword search on data graphs is demonstrated on two challenging datasets: the large DBLP and Mondial (which is highly cyclic and has a complex schema). The system supports search, exploration and question answering. The demonstration shows how the system copes with the main challenges in keywords search on data graphs. In particular,...
Article
Full-text available
When integrating geo‐spatial data sets, a join algorithm is used for finding sets of corresponding objects (i.e., objects that represent the same real‐world entity). This article investigates location‐based join algorithms for integration of several data sets. First, algorithms for integration of two data sets are presented and their performances,...
Conference Paper
A novel method for creating collection summaries is developed, and a fully decentralized peer-selection algorithm is described. This algorithm finds the most promising peers for answering a given query. Specifically, peers publish per-term synopses of their documents. The synopses of a peer for a given term are divided into score intervals and for...
Article
Query evaluation over probabilistic XML is explored. The queries are twig patterns with projection, and the data is represented in terms of three models of probabilistic XML (that extend existing ones in the literature). The first model makes an assumption of independence among the probabilistic junctions, whereas the second model can encode probab...
Article
Full-text available
Various known models of probabilistic XML can be represented as instantiations of the abstract notion of p-documents. In addition to ordinary nodes, p-documents have distributionalnodesthatspecifythepossibleworldsand their probabilistic distribution. Particular families of p-doc- uments are determined by the types of distributional nodes that can b...
Conference Paper
Full-text available
In a route search over geospatial data, a user provides terms for specifying types of geographical entities that she wishes to visit. The goal is to find a route that (1) starts at a given location, (2) ends at a given location, and (3) travels via geospatial entities that are relevant to the provided search terms. Earlier work studied the problem...
Conference Paper
ExQueX is an interactive system for exploring and querying XML documents. The exploration is done by searching, ranking and filtering, and it enables users to discover relationships that exist in a given document. The results of the exploration can be used either directly as tree queries or as building blocks in the process of formulating more comp...
Conference Paper
We consider the problem of full-text search involving multi-term queries in a network of self-organizing, autonomous peers. Existing approaches do not scale well with respect to the number of peers, because they either require access to a large number of peers or incur a high communication cost in order to achieve good query results. In this paper,...
Conference Paper
Full-text available
A route leads from a start location to a final destination and passes through geospatial entities that are picked according to search terms provided by the user. Each entity is perti- nent to exactly on term and its degree of relevance is given in terms of a probability. In the route-search problem, the goal is to minimize the traveling distance wh...
Article
Constraints are important not just for maintaining data integrity, but also because they capture natural probabilistic dependencies among data items. A probabilistic XML database (PXDB) is the probability sub-space comprising the instances of a p-document that satisfy a set of constraints. In contrast to existing models that can express probabilist...
Conference Paper
Full-text available
The problem of rewriting a query using a materialized view is studied for a well known fragment of XPath that includes the following three constructs: wildcards, descendant edges and branches. In earlier work, determining the existence of a rewriting was shown to be coNP-hard, but no tight com- plexity bound was given. While it was argued that §p 3...
Conference Paper
Tree automata (specifically, bottom-up and unranked) form a powerful tool for querying and maintaining validity of XML documents. XML with uncertain data can be modeled as a probability space of labeled trees, and that space is often represented by a tree with distributional nodes. This paper investigates the problem of evaluating a tree automaton...
Article
A survey on modeling and querying probabilistic XML data with focus on the tradeoff between the ability to express real-world probabilistic data and the efficiency of query evaluation is reported. The families PrXML exp and PrXML cie exhibit a clear tradeoff between the efficiency of query evaluation and the ability to model correlation between pro...
Article
This paper investigates a graph enumeration problem, called the maximalP-subgraphs problem, where P is a hereditary or connected-hereditary graph property. Formally, given a graph G, the maximal P-subgraphs problem is to generate all maximal induced subgraphs of G that satisfy P. This problem differs from the well-known node-deletion problem, studi...
Article
Various approaches for keyword search in different settings (e.g., relational databases and XML) actually deal with the problem of enumerating K-fragments. For a given set of keywords K, a K-fragment is a subtree T of the given data graph, such that T contains all the keywords of K and no proper subtree of T has this property. There are three types...
Conference Paper
Full-text available
In keyword search over data graphs, an answer is a nonredundant subtree that includes the given keywords. An algorithm for enumerating answers is presented within an architecture that has two main components: an engine that generates a set of candidate answers and a ranker that evaluates their score. To be effective, the engine must have three fund...
Conference Paper
Constraints are important, not only for maintaining data integrity, but also because they capture natural probabilistic dependencies among data items. A probabilistic XML database (PXDB) is the probability subspace comprising the instances of a p-document that satisfy a set of constraints. In contrast to existing models that can express probabilist...
Conference Paper
Full-text available
In a geographical route search, given search terms, the goal is to nd an ee ctive route that (1) starts at a given lo- cation, (2) ends at a given location, and (3) travels via geographical entities that are relevant to the given terms. A route is eectiv e if it does not exceed a given distance limit whereas the ranking scores of the visited entiti...
Conference Paper
Various known models of probabilistic XML can be represented as instantiations of abstract p-documents. Such documents have, in addition to ordinary nodes, distributional nodes that specify the probabilistic process of generating a random document. Within this abstraction, families of pdocuments, which are natural extensions and combinations of pre...
Conference Paper
Full-text available
Redundancy and minimization of queries are investigated in a well known fragment of XPath that includes child and descendant edges, branches, wildcards, and multiple output nodes. Contrary to a published result, a proposed technique does not guarantee minimality or even non-redundancy, and it is unknown whether a non-redundant query is also mini- m...
Conference Paper
Full-text available
An uncertain geo-spatial dataset is a collection of geo-spatial objects that do not represent accurately real-world entities. Each object has a confidence value indicating how likely it is for the object to be correct. Uncertain data can be the result of operations such as imprecise integration, incorrect update or inexact querying. A k-route, over...
Article
The full disjunction is a variation of the join operator that maximally combines tuples from connected relations, while preserving all information in the relations. The full disjunction can be seen as a natural extension of the binary outerjoin operator to an arbitrary number of relations and is a useful operator for information integration. This p...
Article
Full-text available
Equivalence of aggregate queries is investigated for the class of conjunctive queries with comparisons and the aggregate operators count, count-distinct, min, max, and sum. Essentially, this class contains unnested SQL queries with the above aggregate operators, with a where clause con- sisting of a conjunction of comparisons, and without a having...
Conference Paper
Full-text available
In many cases, users may want to consider incomplete answers to their queries. Often, however, there is an overwhelming number of such answers, even if subsumed answers are ignored and only maximal ones are considered. Therefore, it is important to rank answers according to their degree of incompleteness and, moreover, this ranking should be combin...
Conference Paper
Full-text available
Conceptually, the common approach to manipulating probabilistic data is to evaluate relational queries and then calculate the probability of each tuple in the result. This approach ignores the possibility that the probabilities of complete answers are too low and, hence, partial answers (with sufficiently high probabilities) become important. There...
Conference Paper
Full-text available
Evaluation of twig queries over probabilistic XML is inves- tigated. Projection is allowed and, in particular, a query may be Boolean. It is shown that for a well-known model of probabilistic XML, the evaluation of twigs with projection is tractable under data complexity (whereas in other proba- bilistic data models, projection is intractable). Und...
Conference Paper
Full-text available
A substantial amount of data about geographical entities is available on the World-Wide Web, in the form of digital maps. This paper investigates the integration of such data. A three-step integration process is presented. F irst, geo- graphical objects are retrieved from Maps on the Web. Secondly, pairs of objects that represent the same real-worl...
Conference Paper
Full-text available
Integration of two road maps is finding a matching between pairs of objects that represent, in the maps, the same real- world road. Several algorithms were proposed in the past for road-map integration; however, these algorithms are not efficient and some of them even require human feedback. Thus, they are not suitable for many important applica- t...
Conference Paper
Full-text available
Evaluations of SQL queries with the ORDER BY clause is con- sidered. The naive approach of first computing the result and then sorting the tuples is not suitable for Web applications, since the result could be very large while users expect to get quickly the top-k tuples. Tractabil- ity, in this case, amounts to enumerating answers in sorted order...
Article
Full-text available
The problem of rewriting aggregate queries using views is studied for conjunctive queries with arbitrary aggregation functions and built-in predicates. Two types of queries over views are intro- duced for rewriting aggregate queries: pure candidates and aggregate candidates. Pure candidates can be used to rewrite arbitrary aggregate queries. Aggreg...
Conference Paper
Full-text available
Various approaches for keyword proximity search have been implemented in relational databases, XML and the Web. Yet, in all of them, an answer is a Q-fragment, namely, a subtree T of the given data graph G, such that T contains all the keywords of the query Q and has no proper subtree with this property. The rank of an answer is inversely propor- t...
Article
New algorithms for computing Steiner trees for a fixed number of terminals are pre- sented. For the undirected Steiner-tree problem, an improvement of the algorithm of Dreyfus and Wagner is presented. The improved algorithm avoids the computation of all shortest paths between pairs of nodes and, hence, reduces the running time. Thus, the running ti...
Conference Paper
Full-text available
Full disjunctions are an associative extension of the outer- join operator to an arbitrary number of relations. Their main advantage is the ability to maximally combine data from different relations while preserving all the original in- formation. An algorithm for efficiently computing full dis- junctions is presented. This algorithm is superior to...
Conference Paper
Full-text available
Various approaches for keyword search in different settings (e.g., relational databases, XML and the Web) actually deal with the problem of enumerating K-fragments. For a given set of keywords K, a K-fragment is a subtree T of the given data graph, such that T contains all the keywords of K and no proper subtree of T has this property. There are th...
Conference Paper
Full-text available
A framework for modeling query semantics as graph properties is presented. In this framework, a single definition of a query automatically gives rise to several semantics for evaluating that query under varying degrees of incomplete information. For example, defining natural joins automatically gives rise to full disjunctions. Two of the proposed s...
Conference Paper
Full-text available
This paper presents a formal framework for investigating keyword proximity search. Within this framework, three variants of keyword proximity search are dened. For each variant, there are algorithms for enumerating all the results in an arbitrary order, in the exact order and in an approx- imate order. The algorithms for enumerating in the exact or...
Conference Paper
Full-text available
When integrating geo-spatial datasets, a join algorithm is used for finding sets of corresponding objects (i.e., objects that represent the same real-world entity). Algorithms for joining two datasets were studied in the past. This paper investigates integration of three datasets and proposes methods that can be easily generalized to any number of...
Conference Paper
The full disjunction is a variation of the join operator that maximally combines tuples from connected relations, while preserving all information in the relations. The full disjunction can be seen as a natural extension of the binary outerjoin operator to an arbitrary number of relations and is a useful operator for information integration. This p...
Conference Paper
Full-text available
A framework for describing semantic relationships among nodes in XML documents is presented. In contrast to earlier work, the XML documents may have ID references (i.e., they correspond to graphs and not just trees). A specific interconnection semantics in this framework can be defined explicitly or derived automatically. The main advantage of inte...
Article
Full-text available
The problem of computing all maximal induced subgraphs of a graph G that have a graph property P, also called the maximal P-subgraphs problem, is considered. This problem is studied for hereditary, connected-hereditary and rooted-hereditary graph properties. The maximal P-subgraphs problem is reduced to restricted versions of this problem by provid...
Article
Full-text available
Given two geographic databases, a fusion algorithm should produce all pairs of corresponding objects. This chapter describes the four fusion algorithms that only use locations of objects, and measures their performance in terms of recall and precision. These algorithms are designed to work even when locations are imprecise and each database represe...
Article
Full-text available
AB?DCEF?EGC=GCH-IG=AJ=GK=GIGCMLONQPESRUT@VWEGCIGEG<>CXAZY[9;:CAB?] ?EGPDAB^_:`^P H=G<>=GIG=@P N@AJXAZEG<>CIKP NbaHNQPEGL%ABIG<cP Hed'CIGEG<>CXAZYIGC^_:] H<>fgC=SAZghLJCH-IGCikj;<>IG:lIG:CABm<>Y><cIKkIGP@h<>XCnjCM<ch :-IG=bIGP@i<>oDCEG] CMHINpEAZh LJCH-IG=.PNA"iPq^gL@CH-Irm4AZ=GCiFPHFIG:C"IAZh =[ts?DCM^<>u^-ABYcY>Kr IGCMEGLvNpEGCMfgCH^M<cCM=r<>HXCEG...
Article
Full-text available
XSEarch, a semantic search engine for XML, is presented. XSEarch has a simple query language, suitable for a naive user. It returns semantically related document fragments that satisfy the user's query. Query answers are ranked using extended information-retrieval techniques and are generated in an order similar to the ranking. Advanced indexing te...
Conference Paper
Full-text available
This paper describes a method for proving termination of queries to logic programs based on abstract interpretation. The method uses query-mapping pairs to abstract the relation between calls in the LD-tree associated with the program and query. Any well founded partial order for terms can be used to prove the termination. The ideas of the query-ma...
Chapter
Full-text available
This chapter describes XSEarch, a semantic search engine for XML. XSEarch has a simple query language, suitable for a naive user. It returns semantically related document fragments that satisfy the user's query. Query answers are ranked using extended information-retrieval techniques and are generated in an order similar to the ranking. Advanced in...
Conference Paper
Full-text available
Under either the OR-semantics or the weak semantics, the answer to a query over semistructured data consists of maximal rather than complete matchings, i.e., some query variables may be assigned null values. In the relational model, a similar effect is achieved by computing the full disjunction (rather than the natural join or equijoin) of the give...
Article
Full-text available
Semistructured data occur in situations where information lacks a homogeneous structure and is incomplete. Yet, up to now the incompleteness of information has not been reflected by special features of query languages. Our goal is to investigate the principles of queries that allow for incomplete answers.
Article
Full-text available
We have developed and implemented QUEST---a system for QUErying Semantically Tagged documents on the World-Wide Web. The advent of new markup languages like xml makes it likely that documents on the Web are not simple html documents, but contain in addition objects that represent the semantic structure of the document.
Article
Full-text available
Flexible queries facilitate, in a novel way, easy and concise querying of databases that have varying structures. Two dierent semantics, exible and semiexible, are introduced and investigated. The complexity of evaluating queries under the two semantics is analyzed. Query evaluation is polynomial in the size of the query, the database and the resul...
Conference Paper
Full-text available
This paper discusses several mechanisms for creating relations out of XML documents. A relation generator consists of two parts: (1) a tuple of path expressions and (2) an index indicating which path expressions may not be assigned the null value. Evaluating a relation generator involves nding tuples of nodes that satisfy the path expressions and a...
Article
Full-text available
The principles of queries that allow incomplete answers are presented. Partial answers make it necessary to refine the model of query evaluation. The first modification relates to the satisfaction of conditions, under some circumstances, conditions involving unbound variables as satisfied are considered. Second, in order to prevent a proliferation...
Article
Full-text available
Query equivalence is investigated for disjunctive aggregate queries with negated subgoals, constants and comparisons. A full characterization of equivalence is given for the aggregation functions count, max, sum, prod, toptwo and parity. A related problem is that of determining, for a given natural number N, whether two given queries are equivalent...
Conference Paper
Full-text available
The problem of deciding containment among aggregate queries is investigated. Containment is reduced to equivalence for queries with expandable aggregation functions. Many common aggregation functions, such as max, cntd (count distinct), count, sum, avg (average), median and stdev (standard deviation) are shown to be expandable. It is shown that eve...
Article
Semistructured data occur in situations where information lacks a homogeneous structure and is incomplete. Yet, up to now the incompleteness of information has not been reflected by special features of query languages. Our goal is to investigate the principles of queries that allow for incomplete answers. We do not present, however, a concrete quer...
Conference Paper
Full-text available
We present a system for searching, collecting, integrating and managing Web-resident data. The system consists of tools, each providing a specific functionality aimed at solving one aspect of the complex task of using and managing Web data. Each tool can be used in a stand-alone mode, in combination with the other tools, or even in conjunction with...
Article
EquiX is a search language for XML that combines the power of querying with the simplicity of searching. Requirements for such languages are discussed, and it is shown that EquiX meets the necessary criteria. Both a graph-based abstract syntax and a formal concrete syntax are presented for EquiX queries. In addition, the semantics is defined and an...
Article
This work explores (semi-)automated EC on the WWW. The E-Contracts framework enables EC WWW sites and EC automated tools to present standardized information. This information (1) allows each party to decide whether it wishes to engage in an EC activity with the other party, (2) enables automated negotiation between the parties, and (3) enables the...
Article
Full-text available
SQL4X, a powerful language for simultaneously querying both relational and XML databases, is presented. SQL4X queries can create both relations and XML documents as results. Thus, SQL4X can be thought of as an integration language. In order to allow easy integration of XML documents with varied structures, SQL4X uses exible semantics. SQL4X is also...
Conference Paper
Full-text available
This paper discusses evaluation of select-project (SP) queries over an XML document. A SP query consists of two parts: (1) a conjunction of conditions on values of labels (called the selection) and (2) a series of labels whose values should be outputed (called the projection). Query evaluation involves finding tuples of nodes that have the labels m...
Conference Paper
Full-text available