
Yuqing Melanie WuPomona College · Department of Computer Science
Yuqing Melanie Wu
Ph.D.
About
74
Publications
6,989
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,252
Citations
Introduction
Additional affiliations
July 2017 - present
July 2015 - present
January 2000 - May 2004
Education
January 2000 - May 2004
Publications
Publications (74)
Fragments of Tarski's relation algebra form the basis of many versatile graph and tree query languages including the regular path queries, XPath, and SPARQL. Surprisingly, however, a systematic study of the relative expressive power of relation algebra fragments on trees has not yet been undertaken. In this work, we perform such a systematic study....
Relational query languages rely heavily on costly join operations to combine tuples from multiple tables into a single resulting tuple. In many cases, the cost of query evaluation can be reduced by manually optimizing (parts of) queries to use cheaper semi-joins instead of joins. Unfortunately, existing database products can only apply such optimiz...
Many graph query languages rely on composition to navigate graphs and select nodes of interest, even though evaluating compositions of relations can be costly. Often, this need for composition can be reduced by rewriting toward queries using semi-joins instead, resulting in a significant reduction of the query evaluation cost. We study techniques t...
Motivated by the continuing interest in the tree data model, we study the expressive power of downward navigational query languages on trees and chains. Basic navigational queries are built from the identity relation and edge relations using composition and union. We study the effects on relative expressiveness when we add transitive closure, proje...
Many data sources can be represented easily by collections of sets of objects. For several practical queries on such collections of sets of objects, the answer does not depend on the precise composition of these sets, but only on the number of sets to which each object belongs. This is the case k= 1 for the more general situation where the query an...
We study the problem of enumeration of all k-sized subsets of temporal events that mutually overlap at some point in a query time window. This problem arises in many application domains, e.g., in social networks, life sciences, smart cities, telecommunications, and others. We propose a start time index (STI) approach that overcomes the efficiency b...
Symmetric queries are introduced as queries on a sequence of sets of objects the result of which does not depend on the order of the sets. An appropriate data model is proposed, and two query languages are introduced, QuineCALC and SyCALC. They are correlated with the symmetric Boolean respectively relational functions. The former correlation yield...
Motivated by the continuing interest in the tree data model, we study the expressive power of downward navigational query languages on trees and chains. Basic navigational queries are built from the identity relation and edge relations using composition and union. We study the effects on relative expressiveness when we add transitive closure, proje...
Fragments of Tarski’s relation algebra form the basis of many versatile graph and tree query languages including the regular path queries, XPath, and SPARQL. Surprisingly, however, a systematic study of the relative expressive power of relation algebra fragments on trees has not yet been undertaken. Our approach is to start from a basic fragment wh...
For several practical queries on bags of sets of objects, the answer does not depend on the precise composition of these sets, but only on the number of sets to which each object belongs. This is the case k = 1 for the more general situation where the query answer only depends on the number of sets to which each group of at most k objects belongs....
Many graph query languages rely on the composition operator to navigate graphs and select nodes of interests, even though evaluating compositions of relations can be costly. Often, this need for composition can be reduced by rewriting towards queries that use semi-joins instead. In this way, the cost of evaluating queries can be significantly reduc...
Assessing trustworthiness of social media posts is increasingly important, as the number of online users and activities grows. Current deploying assessment systems measure post trustworthiness as credibility. However, they measure the credibility of all posts, indiscriminately. The credibility concept was intended for news types of posts. Labeling...
Motivated by the continuing interest in the tree data model, we study the expressive power of downward fragments of navigational query languages on trees. The basic navigational query language we consider expresses queries by building binary relations from the edge relations and the identity relation, using composition and union. We study the effec...
Akhtar et al. introduced equality-generating constraints and functional constraints as a first step towards dependency-like integrity constraints for RDF data [3]. Here, we focus on functional constraints. Since the usefulness of functional constraints is not limited to the RDF data model, we study the functional constraints in the more general set...
Given a document D in the form of an unordered node-labeled tree, we study
the expressiveness on D of various basic fragments of XPath, the core
navigational language on XML documents. Working from the perspective of these
languages as fragments of Tarski's relation algebra, we give characterizations,
in terms of the structure of D, for when a bina...
Motivated by both established and new applications, we study navigational query languages for graphs (binary relations). The simplest language has only the two operators union and composition, together with the identity relation. We make more powerful languages by adding any of the following operators: intersection; set difference; projection; copr...
The fragility and interconnectivity of the planet argue compellingly for a greater understanding of how different communities make sense of their world. One of such critical demands relies on comparing the Chinese and the rest of the world (e.g., Americans), where communities' ideological and cultural backgrounds can be significantly different. Whi...
Akhtar et al. introduced equality-generating constraints and functional constraints as an initial step towards dependency-like integrity constraints for RDF data [1]. Here, we focus on functional constraints. The usefulness of functional constraints is not limited to the RDF data model. Therefore, we study the functional constraints in the more gen...
Keyword search, the major means for Internet search engines, has recently been explored in structured and semi-structured data. What is yet to be explored thoroughly is how optional and negative keywords can be expressed, what the results should be and how such search queries can be evaluated efficiently. In this paper, we formally define a new typ...
In this paper, we present, to our knowledge, the first known I/O efficient solutions for computing the k-bisimulation partition of a massive directed graph, and performing maintenance of such a partition upon updates to the underlying graph. Ubiquitous in the theory and application of graph data, bisimulation is a robust notion of node equivalence...
Many data-intensive applications have to query a database that involves sequences of sets of objects. It is not uncommon that the order of the sets in such a sequence does not affect the result of the query. Such queries are called symmetric. In this paper, the authors wish to initiate research on symmetric queries. Thereto, a data model is propose...
Computing the bisimulation partition of a graph is a fundamental problem which plays a key role in a wide range of basic applications. Intuitively, two nodes in a graph are bisimilar if they share basic structural properties such as labeling and neighborhood topology. In data management, reducing a graph under bisimulation equivalence is a crucial...
Bisimulation is a basic graph reduction operation, which plays a key role in a wide range of graph analytical applications. While there are many algorithms dedicated to computing bisimulation results, to our knowledge, little work has been done to analyze the results themselves. Since data properties such as skew can greatly influence the performan...
The privacy of hundreds of millions of people today could be compromised due to people databases which claim to store many personal details about individuals, often without their knowledge. While the paid versions of these databases may be prohibitively expensive for data mining on a mass scale, in this paper, we show that even the limited informat...
The last decade has witnessed an explosion of the availability of and interest in graph structured data. The desire to search and reason over these increasingly massive data collections pushes the boundaries of search languages, from pure keyword search to structure-aware searches in the graph. These phenomena have inspired a rich body of research...
Several established and novel applications motivate us to
study the expressive power of navigational query languages on graphs,
which represent binary relations. Our basic language has only the operators union and composition, together with the identity relation. Richer languages can be obtained by adding other features such as other set operators,...
Whether it be data from ubiquitous devices such as sensors or data generated from telescopes or other laboratory instruments, technology apparent in many scientific disciplines is generating data at rates never witnessed before. Computational scientists are among the many who perform inductive experiments and analyses on these data with the goal of...
In many domains, such as bioinformatics, cheminformatics, health informatics and social networks, data can be represented naturally as labeled graphs. To address the increasing needs in discovering interesting associations between entities in such data graphs, especially under complicated keyword-based and structural constraints, we introduce Conka...
In many domains, such as social networks and chem-informatics, data can be represented naturally in graph model, with nodes being data entries and edges the relationships between them. We study the application requirements in these domains and find that discovering Constrained Acyclic Paths (CAP) is highly in demand. In this paper, we define the CA...
RDF has gained great interest in both academia and industry as an important language to describe graph data. Several approaches
have been proposed for storing and querying RDF data efficiently; each works best under certain circumstances, e.g. certain
types of data and/or queries. However, there was lack of a thorough understanding of exactly what...
We study the expressiveness of a positive fragment of path queries, denoted Path+, on documents that can be represented as node-labeled trees. The expressiveness of Path+ is studied from two angles. First, we establish that Path+ is equivalent in expressive power to two particular subfragments, as well as to the class of tree queries, a subclass of...
Motivated by both established and new applications, we study navigational query languages for graphs (binary relations). The simplest language has only the two operators union and composition, together with the identity relation. We make more powerful languages by adding any of the following operators: intersection; set difference; projection; copr...
The Semantic Web, which represents a web of knowledge, offers new opportunities to search for knowledge and information. To harvest such search power requires robust and scalable data repositories that can store RDF data and support efficient evaluation of SPARQL queries. Most of the existing RDF storage techniques rely on relation model and relati...
One of the main shortcomings of Semantic Web technologies is that there are few user-friendly ways for displaying, browsing
and querying semantic data. In fact, the lack of effective interfaces for end users significantly hinders further adoption
of the Semantic Web. In this paper, we propose the Semantic Web Portal (SWP) as a light-weight platform...
Collaborative tagging of resources on the Web has become a commonplace occurrence. Web sites allowing resources to be tagged provide a tremendous amount of user-generated taxonomic information. However, information seekers are hindered by the lack of organization within these tags as well as the multitude of domains encompassed within these sites....
We study the expressiveness of a positive fragment of path queries, denoted Path+, on node-labeled trees documents. The expres- siveness of Path+ is studied from two angles. First, we establish that Path+ is equivalent in expressive power to a particular sub-fragment as well as to the class of tree queries, a sub-class of the flrst-order conjunc- t...
We propose XQGen, a stand-alone, algebra-based XPath generator to aid engineers in testing and improving the de- sign of XML query engines. XQGen takes an XML schema sketch and user configurations, such as number of queries, query types, duplication factors, and branching factors as input, and generates a set of queries that comform to the schema a...
Recent studies have proposed structural summary techniques for path-query evaluation on semi-structured data sources. One major line of this research has been the introduction of the DataGuide, 1-index, 2-index, and A(k) indices, and subsequent investigations and generalizations. Another recent study has considered structural characterizations of f...
Well-designed indices can dramatically improve query performance. Including query workload information can produce indices that yield better overall throughput while balancing the space and performance trade-off at the core of index design. In the context of XML, structural indices have proven to be particularly effective in supporting XPath querie...
Structural indices play a significant role in improving the efficiency of XML query evaluation. Being able to compare various structural indexing techniques is critical for a DBMS to select which indices to support, for the query optimizer to choose an index to use in query evaluation, and for DBAs to configure a database application. We present AS...
The importance of performing efficient XML query process-ing increases along with its usage and pervasiveness. Study-ing the properties of important fragments of XML query languages and designing accurate structural summaries (in-cluding indexes and statistical summaries) are all critical in-gredients in solving this problem. However, up to this po...
ChemInform is a weekly Abstracting Service, delivering concise information at a glance that was extracted from about 200 leading journals. To access a ChemInform Abstract of an article which was published elsewhere, please select a “Full Text” option. The original article is trackable via the “References” option.
In this paper, we describe the TIMBER XML database system implemented at University of Michigan. TIMBER was one of the first native XML database systems, designed from the ground up to store and query semi-structured data. A distinctive principle of TIMBER is its algebraic underpinning. Central contributions of the TIMBER project include: (1) tree...
As the number of applications that rely on XML data in- creases, so does the need for performing efficient XML query evaluation. A critical part of the solution involves provid- ing new techniques for designing XML indexes and lookup algorithms. In this paper, we leverage the results of our re- search on coupling the partitions induced by fragments...
The NCI Developmental Therapeutics Program Human Tumor cell line data set is a publicly available database that contains cellular assay screening data for over 40 000 compounds tested in 60 human tumor cell lines. The database also contains microarray assay gene expression data for the cell lines, and so it provides an excellent information resourc...
Supporting efficient access to XML data using XPath [3] continues to be an important research problem [6, 12]. XPath queries Supporting efficient access to XML data using XPath [3] continues to be an important research problem [6, 12]. XPath queries
are used to specify nodelabeled trees which match portions of the hierarchical XML data. In XPath qu...
Access control for semi-structured data is nontrivial, as witnessed by the number of access control approaches in literature.
Recently, a case has been made for expressing access constraints at finer levels of granularity on data nodes and extending
constraints to structural relationships. In this paper, we introduce a rewrite-based approach for ac...
Recent studies have proposed structural summary techniques for path- query evaluation on semi-structured data sources. One major line of this research has been the introduction of the DataGuide, 1-index, 2-index, and A(k) indices, and subsequent investigations and generalizations. Another recent study has con- sidered structural characterizations o...
We introduce a new methodology for coupling language-induced partitions and index-induced partitions on XML documents that is aimed for the benefit of efficient evaluation of XPath queries. In particular, we identify XPath fragments which are ideally coupled with the newly introduced P(k)-partition which has its definition grounded in the well-know...
Recent studies have proposed structural summary tech-niques for path query evaluation on semi-structured data sources. One major line of this research has been the intro-duction of the DataGuide, 1-index, 2-index, and A(k) in-dices, and subsequent investigations and generalizations. Another recent study has considered structural characteri-zations...
We present ACXESS (Access Control for XML with Enhanced Security Specifications), a system for specifying and enforcing enhanced security constraints on XML via virtual "security views" and query rewrites. ACXESS is the first system that bears the capability to specify and enforce complicated security policies on both subtrees and structural relati...
We propose IPAC(Interactive aPproach to Access Control for semi-structured data), a framework for XML access constraint specification and security view selection. IPAC clearly demarcates access constraint specification, access control strategy and security mechanism (implementation). It features a declarative access constraint specification languag...
XML is gaining predominance as the standard for data representation and exchange. Access con-trol for XML data is nontrivial as witnessed from the number of access control models presented in literature. Existing models provide the ability to extend access control to data as well as structure and enforce the specified access control via view materi...
Being able to express and enforce role-based access control on XML data is a critical component of XML data management. However, given the semi-structured nature of XML, this is non-trivial, as access control can be applied on the values of nodes as well as on the structural relationship between nodes. In this context, we adopt and extend a graph e...
In this paper, we examine the interplay of logical and physical design, and experimentally demonstrate that: (1) solving the logical mapping and the physical design problem independently leads to a suboptimal solution; (2) taking into account the physical design space impacts the space of logical mapping. Specifically, well-known outlining and inli...
XML is widely praised for its flexibility in allowing
repeated and missing sub-elements. However, this flexibility makes it
challenging to develop a bulk algebra, which typically manipulates sets
of objects with identical structure. A set of XML elements, say of
type book, may have members that vary greatly in structure, e.g. in
the number of autho...
XML permits repeated and missing sub-elements, and missing attributes. We discuss the consequent implications on grouping, both with respect to specification and with respect to implementation. The techniques described here have been implemented in the TIMBER native XML database system being developed at the University of Michigan.
This paper describes the overall design and architecture of the Timber XML database system currently being implemented at the University of Michigan. The system is based upon a bulk algebra for manipulating trees, and natively stores XML. New access methods have been developed to evaluate queries in the XML context, and new cost estimation and quer...
XML has become ubiquitous, and XML data has to be managed in databases. The current industry standard is to map XML data into relational tables and store this information in a relational database. Such mappings create both expressive power problems and performance problems. In the TIMBER project we are exploring the issues involved in storing XML i...
Structural join operations are central to evaluating queries against XML data, and are typically responsible for consuming a lion's share of the query processing time. Thus, structural join order selection is at the heart of query optimization in an XML database, just as (value-based) join order selection is central to relational query optimization...
Estimating the sizes of query results, and intermediate results, is crucial to many aspects of query processing. In particular, it is necessary for effective query optimization. Even at the user level, predictions of the total result size can be valuable in “next-step” decisions, such as query refinement. This paper proposes a technique to obtain q...
XML permits repeated and missing sub-elements, and missing
attributes. We discuss the consequent implications on grouping, both with
respect to specification and with respect to implementation. The
techniques described here have been implemented in the TIMBER native
XML database system being developed at the University of Michigan.
We present a physical algebra for the manipulation of XML in a database. We show how to map logical algebra operators to this physical algebra. We also present several physical algebra identities that are useful for query optimization.
Estimating the sizes of query results, and intermediate results, is crucial to many aspects of query processing. In particular, it is necessary for effective query optimization. Even at the user level, predictions of the total result size can be valuable in “next-step” decisions, such as query refinement. This paper proposes a technique to obtain q...
XML queries typically specify patterns of selection predicates on
multiple elements that have some specified tree structured
relationships. The primitive tree structured relationships are
parent-child and ancestor-descendant, and finding all occurrences of
these relationships in an XML database is a core operation for XML query
processing. We devel...
As WWW becomes more and more popular and powerful, how to search information on the web in database way becomes an important research topic. COMMIX, which is developed in the DB group in Peking University (China), is a system towards building very large database using data from the Web for information extraction, integration and query answering. CO...
We study the expressiveness of Positive XPath with par-ent/child navigation, denoted XPath + , from two angles. First, we establish that XPath + is equivalent in expressive power to some of its sub-fragments as well as to the class of tree queries, a sub-class of the first-order conjunctive queries de-fined over label, parent, and child predicates....
Well-designed indexes can dramatically improve query per-formance. In the context of XML, structural indexes have proven to be particularly effective in supporting efficient XPath queries -the core of all XML queries, by captur-ing the structural correlation between data components in an XML document. The duality of space and performance is an inev...