Yuqing Melanie Wu

Yuqing Melanie Wu
Pomona College · Department of Computer Science

Ph.D.

About

74
Publications
6,982
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,249
Citations
Additional affiliations
July 2017 - present
Pomona College
Position
  • Chair
July 2015 - present
Pomona College
Position
  • Professor (Associate)
January 2000 - May 2004
University of Michigan
Position
  • PhD Student
Education
January 2000 - May 2004
University of Michigan
Field of study
  • Computer Science

Publications

Publications (74)
Article
Fragments of Tarski's relation algebra form the basis of many versatile graph and tree query languages including the regular path queries, XPath, and SPARQL. Surprisingly, however, a systematic study of the relative expressive power of relation algebra fragments on trees has not yet been undertaken. In this work, we perform such a systematic study....
Chapter
Relational query languages rely heavily on costly join operations to combine tuples from multiple tables into a single resulting tuple. In many cases, the cost of query evaluation can be reduced by manually optimizing (parts of) queries to use cheaper semi-joins instead of joins. Unfortunately, existing database products can only apply such optimiz...
Article
Many graph query languages rely on composition to navigate graphs and select nodes of interest, even though evaluating compositions of relations can be costly. Often, this need for composition can be reduced by rewriting toward queries using semi-joins instead, resulting in a significant reduction of the query evaluation cost. We study techniques t...
Article
Motivated by the continuing interest in the tree data model, we study the expressive power of downward navigational query languages on trees and chains. Basic navigational queries are built from the identity relation and edge relations using composition and union. We study the effects on relative expressiveness when we add transitive closure, proje...
Article
Full-text available
Many data sources can be represented easily by collections of sets of objects. For several practical queries on such collections of sets of objects, the answer does not depend on the precise composition of these sets, but only on the number of sets to which each object belongs. This is the case k= 1 for the more general situation where the query an...
Conference Paper
We study the problem of enumeration of all k-sized subsets of temporal events that mutually overlap at some point in a query time window. This problem arises in many application domains, e.g., in social networks, life sciences, smart cities, telecommunications, and others. We propose a start time index (STI) approach that overcomes the efficiency b...
Article
Symmetric queries are introduced as queries on a sequence of sets of objects the result of which does not depend on the order of the sets. An appropriate data model is proposed, and two query languages are introduced, QuineCALC and SyCALC. They are correlated with the symmetric Boolean respectively relational functions. The former correlation yield...
Article
Full-text available
Motivated by the continuing interest in the tree data model, we study the expressive power of downward navigational query languages on trees and chains. Basic navigational queries are built from the identity relation and edge relations using composition and union. We study the effects on relative expressiveness when we add transitive closure, proje...
Chapter
Fragments of Tarski’s relation algebra form the basis of many versatile graph and tree query languages including the regular path queries, XPath, and SPARQL. Surprisingly, however, a systematic study of the relative expressive power of relation algebra fragments on trees has not yet been undertaken. Our approach is to start from a basic fragment wh...
Chapter
For several practical queries on bags of sets of objects, the answer does not depend on the precise composition of these sets, but only on the number of sets to which each object belongs. This is the case k = 1 for the more general situation where the query answer only depends on the number of sets to which each group of at most k objects belongs....
Conference Paper
Many graph query languages rely on the composition operator to navigate graphs and select nodes of interests, even though evaluating compositions of relations can be costly. Often, this need for composition can be reduced by rewriting towards queries that use semi-joins instead. In this way, the cost of evaluating queries can be significantly reduc...
Article
Assessing trustworthiness of social media posts is increasingly important, as the number of online users and activities grows. Current deploying assessment systems measure post trustworthiness as credibility. However, they measure the credibility of all posts, indiscriminately. The credibility concept was intended for news types of posts. Labeling...
Conference Paper
Motivated by the continuing interest in the tree data model, we study the expressive power of downward fragments of navigational query languages on trees. The basic navigational query language we consider expresses queries by building binary relations from the edge relations and the identity relation, using composition and union. We study the effec...
Article
Akhtar et al. introduced equality-generating constraints and functional constraints as a first step towards dependency-like integrity constraints for RDF data [3]. Here, we focus on functional constraints. Since the usefulness of functional constraints is not limited to the RDF data model, we study the functional constraints in the more general set...
Article
Given a document D in the form of an unordered node-labeled tree, we study the expressiveness on D of various basic fragments of XPath, the core navigational language on XML documents. Working from the perspective of these languages as fragments of Tarski's relation algebra, we give characterizations, in terms of the structure of D, for when a bina...
Article
Full-text available
Motivated by both established and new applications, we study navigational query languages for graphs (binary relations). The simplest language has only the two operators union and composition, together with the identity relation. We make more powerful languages by adding any of the following operators: intersection; set difference; projection; copr...
Article
Full-text available
The fragility and interconnectivity of the planet argue compellingly for a greater understanding of how different communities make sense of their world. One of such critical demands relies on comparing the Chinese and the rest of the world (e.g., Americans), where communities' ideological and cultural backgrounds can be significantly different. Whi...
Conference Paper
Akhtar et al. introduced equality-generating constraints and functional constraints as an initial step towards dependency-like integrity constraints for RDF data [1]. Here, we focus on functional constraints. The usefulness of functional constraints is not limited to the RDF data model. Therefore, we study the functional constraints in the more gen...
Conference Paper
Keyword search, the major means for Internet search engines, has recently been explored in structured and semi-structured data. What is yet to be explored thoroughly is how optional and negative keywords can be expressed, what the results should be and how such search queries can be evaluated efficiently. In this paper, we formally define a new typ...
Article
Full-text available
In this paper, we present, to our knowledge, the first known I/O efficient solutions for computing the k-bisimulation partition of a massive directed graph, and performing maintenance of such a partition upon updates to the underlying graph. Ubiquitous in the theory and application of graph data, bisimulation is a robust notion of node equivalence...
Article
Full-text available
Many data-intensive applications have to query a database that involves sequences of sets of objects. It is not uncommon that the order of the sets in such a sequence does not affect the result of the query. Such queries are called symmetric. In this paper, the authors wish to initiate research on symmetric queries. Thereto, a data model is propose...
Conference Paper
Computing the bisimulation partition of a graph is a fundamental problem which plays a key role in a wide range of basic applications. Intuitively, two nodes in a graph are bisimilar if they share basic structural properties such as labeling and neighborhood topology. In data management, reducing a graph under bisimulation equivalence is a crucial...
Conference Paper
Bisimulation is a basic graph reduction operation, which plays a key role in a wide range of graph analytical applications. While there are many algorithms dedicated to computing bisimulation results, to our knowledge, little work has been done to analyze the results themselves. Since data properties such as skew can greatly influence the performan...
Article
The privacy of hundreds of millions of people today could be compromised due to people databases which claim to store many personal details about individuals, often without their knowledge. While the paid versions of these databases may be prohibitively expensive for data mining on a mass scale, in this paper, we show that even the limited informat...
Conference Paper
The last decade has witnessed an explosion of the availability of and interest in graph structured data. The desire to search and reason over these increasingly massive data collections pushes the boundaries of search languages, from pure keyword search to structure-aware searches in the graph. These phenomena have inspired a rich body of research...
Article
Several established and novel applications motivate us to study the expressive power of navigational query languages on graphs, which represent binary relations. Our basic language has only the operators union and composition, together with the identity relation. Richer languages can be obtained by adding other features such as other set operators,...
Conference Paper
Whether it be data from ubiquitous devices such as sensors or data generated from telescopes or other laboratory instruments, technology apparent in many scientific disciplines is generating data at rates never witnessed before. Computational scientists are among the many who perform inductive experiments and analyses on these data with the goal of...
Conference Paper
Full-text available
In many domains, such as bioinformatics, cheminformatics, health informatics and social networks, data can be represented naturally as labeled graphs. To address the increasing needs in discovering interesting associations between entities in such data graphs, especially under complicated keyword-based and structural constraints, we introduce Conka...
Conference Paper
Full-text available
In many domains, such as social networks and chem-informatics, data can be represented naturally in graph model, with nodes being data entries and edges the relationships between them. We study the application requirements in these domains and find that discovering Constrained Acyclic Paths (CAP) is highly in demand. In this paper, we define the CA...
Conference Paper
Full-text available
RDF has gained great interest in both academia and industry as an important language to describe graph data. Several approaches have been proposed for storing and querying RDF data efficiently; each works best under certain circumstances, e.g. certain types of data and/or queries. However, there was lack of a thorough understanding of exactly what...
Article
Full-text available
We study the expressiveness of a positive fragment of path queries, denoted Path+, on documents that can be represented as node-labeled trees. The expressiveness of Path+ is studied from two angles. First, we establish that Path+ is equivalent in expressive power to two particular subfragments, as well as to the class of tree queries, a subclass of...
Conference Paper
Motivated by both established and new applications, we study navigational query languages for graphs (binary relations). The simplest language has only the two operators union and composition, together with the identity relation. We make more powerful languages by adding any of the following operators: intersection; set difference; projection; copr...
Conference Paper
The Semantic Web, which represents a web of knowledge, offers new opportunities to search for knowledge and information. To harvest such search power requires robust and scalable data repositories that can store RDF data and support efficient evaluation of SPARQL queries. Most of the existing RDF storage techniques rely on relation model and relati...
Conference Paper
Full-text available
One of the main shortcomings of Semantic Web technologies is that there are few user-friendly ways for displaying, browsing and querying semantic data. In fact, the lack of effective interfaces for end users significantly hinders further adoption of the Semantic Web. In this paper, we propose the Semantic Web Portal (SWP) as a light-weight platform...
Conference Paper
Full-text available
Collaborative tagging of resources on the Web has become a commonplace occurrence. Web sites allowing resources to be tagged provide a tremendous amount of user-generated taxonomic information. However, information seekers are hindered by the lack of organization within these tags as well as the multitude of domains encompassed within these sites....
Conference Paper
Full-text available
We study the expressiveness of a positive fragment of path queries, denoted Path+, on node-labeled trees documents. The expres- siveness of Path+ is studied from two angles. First, we establish that Path+ is equivalent in expressive power to a particular sub-fragment as well as to the class of tree queries, a sub-class of the flrst-order conjunc- t...
Conference Paper
Full-text available
We propose XQGen, a stand-alone, algebra-based XPath generator to aid engineers in testing and improving the de- sign of XML query engines. XQGen takes an XML schema sketch and user configurations, such as number of queries, query types, duplication factors, and branching factors as input, and generates a set of queries that comform to the schema a...
Article
Full-text available
Recent studies have proposed structural summary techniques for path-query evaluation on semi-structured data sources. One major line of this research has been the introduction of the DataGuide, 1-index, 2-index, and A(k) indices, and subsequent investigations and generalizations. Another recent study has considered structural characterizations of f...
Conference Paper
Full-text available
Well-designed indices can dramatically improve query performance. Including query workload information can produce indices that yield better overall throughput while balancing the space and performance trade-off at the core of index design. In the context of XML, structural indices have proven to be particularly effective in supporting XPath querie...
Conference Paper
Full-text available
Structural indices play a significant role in improving the efficiency of XML query evaluation. Being able to compare various structural indexing techniques is critical for a DBMS to select which indices to support, for the query optimizer to choose an index to use in query evaluation, and for DBAs to configure a database application. We present AS...
Article
The importance of performing efficient XML query process-ing increases along with its usage and pervasiveness. Study-ing the properties of important fragments of XML query languages and designing accurate structural summaries (in-cluding indexes and statistical summaries) are all critical in-gredients in solving this problem. However, up to this po...
Article
ChemInform is a weekly Abstracting Service, delivering concise information at a glance that was extracted from about 200 leading journals. To access a ChemInform Abstract of an article which was published elsewhere, please select a “Full Text” option. The original article is trackable via the “References” option.
Article
Full-text available
In this paper, we describe the TIMBER XML database system implemented at University of Michigan. TIMBER was one of the first native XML database systems, designed from the ground up to store and query semi-structured data. A distinctive principle of TIMBER is its algebraic underpinning. Central contributions of the TIMBER project include: (1) tree...
Conference Paper
Full-text available
As the number of applications that rely on XML data in- creases, so does the need for performing efficient XML query evaluation. A critical part of the solution involves provid- ing new techniques for designing XML indexes and lookup algorithms. In this paper, we leverage the results of our re- search on coupling the partitions induced by fragments...
Article
The NCI Developmental Therapeutics Program Human Tumor cell line data set is a publicly available database that contains cellular assay screening data for over 40 000 compounds tested in 60 human tumor cell lines. The database also contains microarray assay gene expression data for the cell lines, and so it provides an excellent information resourc...
Conference Paper
Full-text available
Supporting efficient access to XML data using XPath [3] continues to be an important research problem [6, 12]. XPath queries Supporting efficient access to XML data using XPath [3] continues to be an important research problem [6, 12]. XPath queries are used to specify nodelabeled trees which match portions of the hierarchical XML data. In XPath qu...
Conference Paper
Full-text available
Access control for semi-structured data is nontrivial, as witnessed by the number of access control approaches in literature. Recently, a case has been made for expressing access constraints at finer levels of granularity on data nodes and extending constraints to structural relationships. In this paper, we introduce a rewrite-based approach for ac...
Conference Paper
Full-text available
Recent studies have proposed structural summary techniques for path- query evaluation on semi-structured data sources. One major line of this research has been the introduction of the DataGuide, 1-index, 2-index, and A(k) indices, and subsequent investigations and generalizations. Another recent study has con- sidered structural characterizations o...
Conference Paper
We introduce a new methodology for coupling language-induced partitions and index-induced partitions on XML documents that is aimed for the benefit of efficient evaluation of XPath queries. In particular, we identify XPath fragments which are ideally coupled with the newly introduced P(k)-partition which has its definition grounded in the well-know...
Article
Full-text available
Recent studies have proposed structural summary tech-niques for path query evaluation on semi-structured data sources. One major line of this research has been the intro-duction of the DataGuide, 1-index, 2-index, and A(k) in-dices, and subsequent investigations and generalizations. Another recent study has considered structural characteri-zations...
Conference Paper
Full-text available
We present ACXESS (Access Control for XML with Enhanced Security Specifications), a system for specifying and enforcing enhanced security constraints on XML via virtual "security views" and query rewrites. ACXESS is the first system that bears the capability to specify and enforce complicated security policies on both subtrees and structural relati...
Conference Paper
Full-text available
We propose IPAC(Interactive aPproach to Access Control for semi-structured data), a framework for XML access constraint specification and security view selection. IPAC clearly demarcates access constraint specification, access control strategy and security mechanism (implementation). It features a declarative access constraint specification languag...
Article
Full-text available
XML is gaining predominance as the standard for data representation and exchange. Access con-trol for XML data is nontrivial as witnessed from the number of access control models presented in literature. Existing models provide the ability to extend access control to data as well as structure and enforce the specified access control via view materi...
Conference Paper
Full-text available
Being able to express and enforce role-based access control on XML data is a critical component of XML data management. However, given the semi-structured nature of XML, this is non-trivial, as access control can be applied on the values of nodes as well as on the structural relationship between nodes. In this context, we adopt and extend a graph e...
Conference Paper
Full-text available
In this paper, we examine the interplay of logical and physical design, and experimentally demonstrate that: (1) solving the logical mapping and the physical design problem independently leads to a suboptimal solution; (2) taking into account the physical design space impacts the space of logical mapping. Specifically, well-known outlining and inli...
Conference Paper
Full-text available
XML is widely praised for its flexibility in allowing repeated and missing sub-elements. However, this flexibility makes it challenging to develop a bulk algebra, which typically manipulates sets of objects with identical structure. A set of XML elements, say of type book, may have members that vary greatly in structure, e.g. in the number of autho...
Article
Full-text available
XML permits repeated and missing sub-elements, and missing attributes. We discuss the consequent implications on grouping, both with respect to specification and with respect to implementation. The techniques described here have been implemented in the TIMBER native XML database system being developed at the University of Michigan.
Article
Full-text available
This paper describes the overall design and architecture of the Timber XML database system currently being implemented at the University of Michigan. The system is based upon a bulk algebra for manipulating trees, and natively stores XML. New access methods have been developed to evaluate queries in the XML context, and new cost estimation and quer...
Article
Full-text available
XML has become ubiquitous, and XML data has to be managed in databases. The current industry standard is to map XML data into relational tables and store this information in a relational database. Such mappings create both expressive power problems and performance problems. In the TIMBER project we are exploring the issues involved in storing XML i...
Conference Paper
Full-text available
Structural join operations are central to evaluating queries against XML data, and are typically responsible for consuming a lion's share of the query processing time. Thus, structural join order selection is at the heart of query optimization in an XML database, just as (value-based) join order selection is central to relational query optimization...
Article
Estimating the sizes of query results, and intermediate results, is crucial to many aspects of query processing. In particular, it is necessary for effective query optimization. Even at the user level, predictions of the total result size can be valuable in “next-step” decisions, such as query refinement. This paper proposes a technique to obtain q...
Conference Paper
Full-text available
XML permits repeated and missing sub-elements, and missing attributes. We discuss the consequent implications on grouping, both with respect to specification and with respect to implementation. The techniques described here have been implemented in the TIMBER native XML database system being developed at the University of Michigan.
Article
Full-text available
We present a physical algebra for the manipulation of XML in a database. We show how to map logical algebra operators to this physical algebra. We also present several physical algebra identities that are useful for query optimization.
Conference Paper
Full-text available
Estimating the sizes of query results, and intermediate results, is crucial to many aspects of query processing. In particular, it is necessary for effective query optimization. Even at the user level, predictions of the total result size can be valuable in “next-step” decisions, such as query refinement. This paper proposes a technique to obtain q...
Conference Paper
Full-text available
XML queries typically specify patterns of selection predicates on multiple elements that have some specified tree structured relationships. The primitive tree structured relationships are parent-child and ancestor-descendant, and finding all occurrences of these relationships in an XML database is a core operation for XML query processing. We devel...
Conference Paper
As WWW becomes more and more popular and powerful, how to search information on the web in database way becomes an important research topic. COMMIX, which is developed in the DB group in Peking University (China), is a system towards building very large database using data from the Web for information extraction, integration and query answering. CO...
Article
Full-text available
We study the expressiveness of Positive XPath with par-ent/child navigation, denoted XPath + , from two angles. First, we establish that XPath + is equivalent in expressive power to some of its sub-fragments as well as to the class of tree queries, a sub-class of the first-order conjunctive queries de-fined over label, parent, and child predicates....
Article
Full-text available
Well-designed indexes can dramatically improve query per-formance. In the context of XML, structural indexes have proven to be particularly effective in supporting efficient XPath queries -the core of all XML queries, by captur-ing the structural correlation between data components in an XML document. The duality of space and performance is an inev...

Network

Cited By