Scott Durno’s research while affiliated with Acadia University and other places
What is this page?
This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.
The popularity of role-based access control (RBAC) policies within industry has generated consid-erable interest in the research community. Since XML has become a de facto standard for data representation, most RBAC policies are expressed in XML. Although XML documents can be very large, no succinct imple-mentations for these policies exist. This paper describes a novel implementation (not previously proposed) for schema-less and streamed XML documents to provide authorized users with the results of queries on com-pressed documents. The designer of the policy does not need to be aware of any implementation details. Results of this research will be essential for industry, which could take advantage of efficient implementations of RBAC policies.
XML (Extensible Markup Language) is a meta-language (developed by the W3C, World Wide Web Consortium in 1996), which represents semi-structured data using markups. While the use of XML facilitates the interchange and access of data, its verbose nature tends to considerably increase the size of a data file. This increase in size limits applications of XML, in particular, because of time efficiency of storage on large data files, and because of space considerations of storage on mobile devices. Besides storing (possibly compressed) XML data, one is also interested in being able to query them in order to obtain specific information; such as the information pertaining to all patients who visited the emergency room of a specific hospital in the last year.
The reasons for querying a compressed XML file are:
Querying a compressed XML file is generally faster than completely decompressing the compressed file and then querying it.
Portable devices may not have disk space available for a complete decompression of the XML file.
There are many known XML-aware compressors, i.e. compressors, which can take advantage of XML syntax. Some of these XML compressors are grammar-free, in other words, information available to the compressor is limited to the XML document. Other XML compressors are grammar-based, i.e. the compressor is aware of the grammar for which the input document is valid. Grammar-based compressors may produce better results - in terms of both compression rate and time - than grammar-free compressors because they can take advantage of information available in the grammar, but in many applications the grammar is not known and so this approach is not always practical. In the case of the widely used Wratislava corpus [Skibinski et al, 2007], out of seven XML documents, only two provide an XML Schema (enwikibooks and enwikinews), two reference a DTD (shakespeare and dblp), while the others use no schema. Finally, even if an XML Schema is provided, it may define elements that never actually appear in the XML document to be compressed.
In this paper, we describe a queryable, grammar-free XML compressor, called XSAQCT (pronounced exact). Our technique borrows from other XML compressors in that it separates the document structure from the text values and attribute values (collectively called data values), which makes up the content of the document. What is new in our technique is that we first encode the document to succinctly store information about the input document. Next, we apply the appropriate back-end data compressors to the container that stores the document structure and to the containers storing the data values (the type of the data, derived from the containers, may be used to guide the choice of back-end compressors used for various containers). It is well known that, on average, the structure of the XML document represents between 10 and 20 percent of the size of the entire document, and the remaining 80 percent represents text and attribute values. Since the main focus of our work is on queryable compression, our encoding of the document structure supports lazy decompression, i.e. during the querying process of the compressed document; we decompress “as little as possible”. Well-known XML compressors differ in their use of container granularity; some compressors use a single container, while others tend to create many separate containers for related values. The former approach is based on the promise that standard data compressors achieve better results when they get large data sets, but require complete decompression in order to perform a query. On the other hand, the latter approach may suffer from poor compression ratios, but it requires the decompression of only a few (possibly just one) containers. In our approach, we attempt to strike a balance between these two extremes; using containers that will be large enough so that they can be effectively compressed, but at the same time the container structure does not require a full decompression to answer a query. In addition, while our design supports lazy decompression, it is designed to support future extensions and performs operations directly on compressed data, without any decompression. In what follows, we provide a more detailed description of XSAQCT.
Recently, there has been a growing interest in queryable XML compressors, which can be used to query compressed data with minimal decompression, or even without any decompression. At the same time, there are very few such projects, which have been made available for testing and comparisons. In this paper, we report our current work on two novel queryable XML compressors; a schema-based compressor, SXSAQCT, and a schema-free compressor, XSAQCT. While the work on both compressors is in its early stage, our experiments (reported here) show that our approach may be successfully competing with other known queryable compressors. @InProceedings{mldner_et_al:DSP:2008:1673, author = {Tomasz M{"u}ldner and Christopher Fry and Jan Krysztof Miziolek and Scott Durno}, title = {SXSAQCT and XSAQCT: XML Queryable Compressors}, booktitle = {Structure-Based Compression of Complex Massive Data }, year = {2008}, editor = {Stefan B{"o}ttcher and Markus Lohrey and Sebastian Maneth and Wojcieh Rytter}, number = {08261}, series = {Dagstuhl Seminar Proceedings}, ISSN = {1862-4405}, publisher = {Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, Germany}, address = {Dagstuhl, Germany}, URL = {http://drops.dagstuhl.de/opus/volltexte/2008/1673}, annote = {Keywords: XML compression, queryable} }
... This paper investigates succinct, client-based implementation of RBAC policies for schema-less XML documents. The policy for the XML document D can specify occurrences of individual nodes of D, entire subtrees of D or specific text elements in D. The compression process is based on an XML compressor, called XSAQCT, see [19] and [20], (for details, see Section II). However, the designer of policies does not need to be aware of inner-workings of this compressor, or the encryption tools. ...
... Given an XML document as shown in Figure 1, its associated annotated tree is shown in Figure 3. Because of space limitations, we refer the reader to (Müldner et al., 2008) for the algorithms to create an annotated tree and to (Müldner et al., 2009) for the description of how cycles (consecutive children X,Y, X) are dealt with. For a XML document to be uniquely represented by a single annotation tree, it has to satisfy the full mixed content property, i.e., all tags in an XML document have to be separated by character data (see (Müldner et al., 2012a)). ...