Erwin Leonardi’s research while affiliated with Nanyang Technological University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (21)


Stars on steroids: Fast evaluation of multi-source star twig queries in path materialization-based XML databases
  • Article

November 2013

·

24 Reads

Data & Knowledge Engineering

Erwin Leonardi

·

·

Despite a large body of work on XML twig query processing in relational environment, systematic study of XML join evaluation has received little attention in the literature. In this paper, we propose a novel and non-traditional technique for fast evaluation of multi-source star twig queries in a path materialization-based RDBMS. A multi-source star twig joins different XML documents on values in their nodes and the XQuery graph takes a star-shaped structure. Such queries are prevalent in several domains such as life sciences. Rather than following the conventional approach of generating one huge complex SQL query from a twig query, we translate a star query into a list of SQL sub-queries that only materializes minimal information of underlying XML subtrees as intermediate results. We have implemented this scheme on top of a path materialization-based XML storage system called SUCXENT++. Experiments carried out confirm that our proposed approach built on top of an off-the-shelf commercial RDBMS has excellent real-world performance.


Fig. 1. Examples of star twig queries  
Stars on Steroids: Fast Evaluation of Multi-source Star Twig Queries in RDBMS
  • Conference Paper
  • Full-text available

April 2012

·

65 Reads

·

1 Citation

Lecture Notes in Computer Science

Despite a large body of work on xml twig query processing in relational environment, systematic study of xml join evaluation has received little attention in the literature. In this paper, we propose a novel and non-traditional technique for fast evaluation of multi-source star twig queries in a path materialization-based rdbms. A multi-source star twig joins different xml documents on values in their nodes and the XQuery graph takes a star-shaped structure. Such queries are prevalent in several domains such as life sciences. Rather than following the conventional approach of generating one huge complex sql query from a twig query, we translate a star query into a list of sql sub-queries that only materializes minimal information of underlying xml subtrees as intermediate results. Experiments carried out confirm that our proposed approach build on top of an off-the-shelf commercial rdbms has excellent real-world performance.

Download

Fig. 1. Overview of federated access control.  
Fig. 5. The SQL query in the SCAN approach.
Fig. 9. Experimental results: The SCAN approach (in seconds).  
Fig. 10. Experimental results: The DIFF approach (in seconds).  
Efficient Database-Driven Evaluation of Security Clearance for Federated Access Control of Dynamic XML Documents

April 2010

·

84 Reads

·

3 Citations

Lecture Notes in Computer Science

Achieving data security over cooperating web services is becoming a reality, but existing xml access control architectures do not consider this federated service computing. In this paper, we consider a federated access control model, in which Data Provider and Policy Enforcers are separated into different organizations; the Data Provider is responsible for evaluating criticality of requested xml documents based on co-occurrence of security objects, and issuing security clearances. The Policy Enforcers enforce access control rules reflecting their organization-specific policies. A user’s query is sent to the Data Provider and she needs to obtain a permission from the Policy Enforcer in her organization to read the results of her query. The Data Provider evaluates the query and also evaluate criticality of the query, where evaluation of sensitiveness is carried out by using clearance rules. In this setting, we present a novel approach, called the diff approach, to evaluate security clearance by the Data Provider. Our technique is build on top of relational framework and utilizes pre-evaluated clearances by taking the differences (or deltas) between query results.


Figure 1: Examples of XML data.
Towards non-directional Xpath evaluation in a RDBMS

November 2009

·

62 Reads

·

2 Citations

XML query languages use directional path expressions to locate data in an XML data collection. They are tightly coupled to the structure of a data collection, and can fail when evaluated on the same data in a different structure. This paper extends path ex- pressions with a new non-directional axis called the rank-distance axis. Given a context node and two positive integers fi and fl, the rank-distance axis returns those nodes that are ranked between fi and fl in terms of closeness from the context node in any direc- tion. This paper shows how to evaluate the rank-distance axis in a tree-unaware XML database. A tree-unaware implementation does not invade the database kernel to support XML queries, in- stead it uses an existing RDBMS such as Microsoft's SQL server as a back-end and provides a front-end layer to translate XML queries to SQL. This paper presents an overview of an algorithm that trans- lates queries with a rank-distance axis to SQL.


Fig. 1. Visual interface of XBLEND.  
XBLEND: Visual XML Query Formulation Meets Query Processing

March 2009

·

73 Reads

·

4 Citations

Due to the complexity of XML query languages, the need for visual query interfaces that can reduce the burden of query formulation is fundamental to the spreading of XML to wider community. We present a RDBMS-based XML query evaluation system, called XBLEND, that takes a novel and non- traditional approach to improving query performance by blend- ing visual query formulation and query processing. It exploits the latency offered by GUI-based visual query formulation to prefetch portions of the query results. The basic idea is that we prefetch constituent path expressions, store the synopsis of intermediary results, reuse them when connective is added or "Run" is pressed. In our demonstration we show that our system exhibits promising performance in evaluating XML queries and show its usefulness in life sciences domain.


Efficient evaluation of high-selective XML twig patterns with parent child edges in tree-unaware RDBMS

November 2007

·

81 Reads

·

8 Citations

Recent study showed that native twig join algorithms and tree- aware relational framework significantly outperform tree-unaware approaches in evaluating structural relationships in XML twig queries. In this paper, we present an efficient strategy to evaluate high- selective twig queries containing only parent-child relationships in a tree-unaware relational environment. Our scheme is built on top of our SUCXENT++ system. We show that by exploiting the en- coding scheme of SUCXENT++, we can devise efficient strategy for evaluating such twig queries. Extensive performance studies on various data sets and queries show that our approach performs bet- ter than a representative tree-unaware approach (GLOBAL-ORDER) and a state-of-the-art native twig join algorithm (TJFAST) on all benchmark queries with the highest observed gain factors being 243 and 95, respectively. Additionally, our approach reduces sig- nificantly the performance gap between tree-aware and tree-unaware approaches and even outperforms a tree-aware approach (MONETDB/XQUERY) for certain high-selective twig queries. We also report our insights to the plan choices a relational optimizer made during twig query evaluation by visually characterizing its behavior over the relational selectivity space.


Efficient Evaluation of Nearest Common Ancestor in XML Twig Queries Using Tree-Unaware RDBMS

September 2007

·

4 Reads

Lecture Notes in Computer Science

Finding all occurrences of a twig pattern in a database is a core operation in xml query processing. Recent study showed that tree-aware relational framework significantly outperform tree-unaware approaches in evaluating structural relationships in xml twig queries. In this paper, we present an efficient strategy to evaluate a specific class of structural relationship called nca -twiglet in a tree-unaware relational environment. Informally, nca-twiglet is a subtree in a twig pattern where all nodes have the same nearest common ancestor (the root of nca-twiglet). We focus on nca-twiglets having parent-child relationships. Our scheme is build on top of our Sucxent++ system. We show that by exploiting the encoding scheme of Sucxent++ we can reduce useless structural comparisons in order to evaluate nca-twiglets. Through a comprehensive experiment, we show that our approach is not only more scalable but also performs better than a representative tree-unaware approach on all benchmark queries with the highest observed gain factors being 352.


Figure 1: System Architecture of XANADUE.
Figure 2: XANADUE GUI. 
Figure 3: Performance visualization. 
XANADUE: a system for detecting changes to XML data in tree-unaware relational databases

June 2007

·

62 Reads

·

12 Citations

Recently, a number of main memory algorithms for detecting the changes to XML data have been proposed. These ap- proaches are not suitable for detecting changes to large XML document as it requires a lot of memory to keep the two ver- sions of XML documents in the memory. We have developed a novel XML change detection system, called Xanadue that uses traditional relational database engines for detecting changes to large XML data. In this approach, we store the XML documents in the relational database and issue SQL queries (whenever appropriate) to detect the changes. This demonstration will showcase the functionality of our system and the efiectiveness of XML change detection in relational environment.


Figure 5: Relationship between DeweyOrderSum and RValue.  
Figure 10: SQL example: /catalog/book/chapter/following::book  
Figure 18: Portion of EDGE Query Plan DC100 Q5  
Figure 20: Portion of EDGE Query Plan DC10 Q10
Figure 21: Portion of EDGE Query Plan DC100 Q10
Efficient Support for Ordered XPath Processing in Tree-Unaware Commercial Relational Databases

April 2007

·

125 Reads

·

12 Citations

·

Klarinda G. Widjanarko

·

·

[...]

·

Erwin Leonardi

In this paper, we present a novel ordered XPATH evaluation in tree- unaware RDBMS. The novelties of our approach lies in the followings. (a) We propose a novel XML storage scheme which comprises only leaf nodes, their cor- responding data values, order encodings and their root-to-leaf paths. (b) We pro- pose an algorithm for mapping ordered XPATH queries into SQL queries over the storage scheme. (c) We propose an optimization technique that enforces all mapped SQL queries to be evaluated in a "left-to-right" join order. By employ- ing these techniques, we show, through a comprehensive experiment, that our approach not only scales well but also performs better than some representative tree-unaware approaches on more than 65% of our benchmark queries with the highest observed gain factor being 1939. In addition, our approach reduces sig- nificantly the performance gap between tree-aware and tree-unaware approaches and even outperforms a state-of-the-art tree-aware approach for certain bench- mark queries.


Fig. 5. The CalculateHashValue Algorithm.  
Fig. 13. Result quality.  
DTD-Diff: A change detection algorithm for DTDs.

January 2007

·

104 Reads

·

19 Citations

Data & Knowledge Engineering

The DTD of a set of XML documents may change due to many reasons such as changes to the real world events, changes to the user's requirements, and mistakes in the initial design. In this paper, we present a novel algorithm called DTD-Diff to detect the changes to DTDs that defines the structure of a set of XML documents. Such change detection tool can be useful in several ways such as maintenance of XML documents, incremental maintenance of relational schema for storing XML data, and XML schema integration. We compare DTD-Diff with existing XML change detection approaches and show that converting DTD to XML Schema (XSD) (which is in XML document format) and detecting the changes using existing XML change detection algorithms is not a feasible option. Our experimental results show that DTD-Diff is 5–325 times faster than X-Diff when it detects the changes to the XSD files. Compared to XyDiff, DTD-Diff is up to 38 times faster. We also study the result quality of detected deltas.


Citations (12)


... (If no update script between old and new DTDs is given, the algorithms in [8, 5] can generate update scripts between DTDs). Note that more than one superscripted label in s(D) # may correspond to e f,w , as shown below. ...

Reference:

An algorithm for correcting XSLT rules according to DTD updates
DTD-Diff: A change detection algorithm for DTDs

Data & Knowledge Engineering

... In our context, delta format is based on the change detection model proposed in our previous work using delta operations in the format of relational tables [3]. This technique is already examined in the context of XML document change detection using different approaches with XML trees that are ordered [18,21] and unordered [19,20,36]. In what follows, we discuss each type of delta and show the advantages of adopting the relationalbased delta in the context of XML Schema. ...

Xandy: A Scalable Change Detection Technique for Ordered XML Documents Using Relational Databases
  • Citing Article
  • November 2006

Data & Knowledge Engineering

... In each scenario, we first trained QuickSel on the first 10 observed queries (i.e., sequence numbers: 1-10) and measured the accuracy on the next 10 observed queries (i.e., sequence numbers: [11][12][13][14][15][16][17][18][19][20]. Then, we trained QuickSel on the first 20 observed queries (i.e., sequence numbers: 1-20) and measured the accuracy using the next 10 observed queries (i.e., their sequence numbers: [21][22][23][24][25][26][27][28][29][30]. ...

Efficient evaluation of high-selective XML twig patterns with parent child edges in tree-unaware RDBMS

... Similarity functions related to JSON in database systems are limited to basic values, e.g., strings and sets, and cannot be used to compute the distance between JSON documents [18,33,35,39,42]. For other data formats, similarity queries and the related distance measures are well studied, e.g., for XML [17,22,32,37] data, which -like JSON -is a hierarchical data format. Common approaches for XML are based on the well-known tree edit distance [41], which is the minimal difference between two documents respecting both their hierarchical structure and data values. ...

Detecting changes on unordered XML documents using relational databases: a schema-conscious approach

... In [13], E. Leonardi and Bhowmick studied that web page change detection using ordered and unordered tree model method is not appropriate for the reason that many memories needed for keeping two different types of XML records in the memory. In this research work author use relational database for identifying content variations of ordered large XML document. ...

Detecting Content Changes on Ordered XML Documents Using Relational Databases

Lecture Notes in Computer Science

... In our context, delta format is based on the change detection model proposed in our previous work using delta operations in the format of relational tables [3]. This technique is already examined in the context of XML document change detection using different approaches with XML trees that are ordered [18,21] and unordered [19,20,36]. In what follows, we discuss each type of delta and show the advantages of adopting the relationalbased delta in the context of XML Schema. ...

Oxone: A Scalable Solution for Detecting Superior Quality Deltas on Ordered Large XML Documents

Lecture Notes in Computer Science

... Relational databases have been a dominating workhorse in the data world and already employed in various application fields. Performing change detection in XML documents using their relational representation has been proposed by many studies, such as DiffXML (Chen et al., 2004) and XANADUE (Leonardi and Bhowmick, 2007). However, as explained by (Keller, 1997), (Golobisky and Vecchietti, 2005) and (Agoub et al., 2016), representing XML and especially CityGML objects using a relational data model has some known limitations. ...

XANADUE: a system for detecting changes to XML data in tree-unaware relational databases

... While SQL is by far the dominant query language for relational databases, since it been introduced within IBM System R research project in 1974 [10]. Several research efforts, dating back to 1977, have addressed allowing a casual user to interact easily with the DBMS by using visual query language (e.g., [29,18,20]) or XML database such as [28]. ...

XBLEND: Visual XML Query Formulation Meets Query Processing

... On the one hand, there has been a host of work, c.f., [13,19], on enabling relational databases to be tree-aware by modifying the database kernel to support xml. On the other hand, some completely jettison the invasive approach and resort to a tree-unaware approach, c.f., [3,12,24,25,28,32], where the database kernel is not modified to support XPath queries. Typically, in this approach, an xml document is shredded into relational table(s) based on the following two schemes: the schema-oblivious and the schema-conscious techniques. ...

Efficient Support for Ordered XPath Processing in Tree-Unaware Commercial Relational Databases