Figure - uploaded by Tomasz Müldner
Content may be subject to copyright.
Source publication
XML (Extensible Markup Language) is a meta-language (developed by the W3C, World Wide Web Consortium in 1996), which represents semi-structured data using markups. While the use of XML facilitates the interchange and access of data, its verbose nature tends to considerably increase the size of a data file. This increase in size limits applications...
Similar publications
As the number of mobile devices is experiencing an explosive growth, mobile query processing has become an important application of mobile devices. One of the most frequently used mobile queries is range queries - retrieving surrounding objects of interest. As mobile devices are moving, these range queries are literally moving range queries. The ma...
Citations
... This paper investigates succinct, client-based implementation of RBAC policies for schema-less XML documents. The policy for the XML document D can specify occurrences of individual nodes of D, entire subtrees of D or specific text elements in D. The compression process is based on an XML compressor, called XSAQCT, see [19] and [20], (for details, see Section II). However, the designer of policies does not need to be aware of inner-workings of this compressor, or the encryption tools. ...
... This section defines paths, similar paths, and only briefly describes XML compressor, called XSAQCT, used in the implementation, for more details see [19]. ...
The popularity of role-based access control (RBAC) policies within industry has generated consid-erable interest in the research community. Since XML has become a de facto standard for data representation, most RBAC policies are expressed in XML. Although XML documents can be very large, no succinct imple-mentations for these policies exist. This paper describes a novel implementation (not previously proposed) for schema-less and streamed XML documents to provide authorized users with the results of queries on com-pressed documents. The designer of the policy does not need to be aware of any implementation details. Results of this research will be essential for industry, which could take advantage of efficient implementations of RBAC policies.
... This paper examines the inherent relationship between many types of XML-conscious compressors and column-stores, and shows that XML compressors and column-store are trying to solve very similar architectural issues with respect to storage and retrieval in the columnar environment. A specific example of an XML-conscious compression system, called XSAQCT is presented, see [Müldner et al., 2009] and see [Müldner et al., 2014] for theoretical background. XSAQCT requires the same functionality as a column-store, i.e., using similar path-based compression that resembles column-based compression in column-stores (while ignoring things such as SQL Joins). ...
... XSAQCT is a permuting XML compressor, with its compressed representation (i.e., P) in the form of an Annotated Tree, see [Müldner et al., 2009]. An annotated tree of an XML document is a tree in which every node is represented by similar paths (i.e., paths that are identical) merged into a single path, and labeled by its tag name. ...
With the renewed industrial and academic interest in Column-Oriented Database Management Systems, a lot of interest has been shown in the area of software optimizations designed to improve the efficiency of queries in the Column-Oriented domain. Meanwhile, XML database management systems are often considered in the context of mapping its hierarchal structure to the relational domain to allow efficient querying upon the XML structure. However, by examining the relationship between specific XML Compression techniques and Column-Oriented Databases Systems, we can reduce the vast overheads in organizing XML in relational entities. This paper examines said relationships and presents a lightweight XML Database System derived from XML Compression, based on Column-Oriented Database architectures.
... On the other hand, some of these compressors focus on compressing and retrieving information from the compressed version with little or no decompression. The latest approaches need to keep the structure part of the document for retrieving purposes and to do so, they need to generate the XST either indexed or normal trees [12], [13], [14], [15], [16], [17], [18], [19], [20], [21]. ...
Nowadays, several approaches are dealing with XML documents require generating the Structure Tree (XST) for these documents. These approaches could be for compressing, integrating, Ontology representation, finding similarity of XML documents. This paper investigates the proper ways to generate XST by proposing two algorithms each one depends on different parsing techniques, SAX and DOM and explains the main differences between them. Testing the memory and the time required to generate the XST shows that using SAX is faster and can save up to half the memory required using DOM. Keywords—XML-Tree, XML structure Tree, Simple API for XML (API), Document Object Model (DOM) Compressing XML, XML Integration.
XML has become the standard way for representing and transforming data over the World Wide Web. Moreover, these documents are becoming the way to represent the object used in Mobile-learning technology. The problem with XML documents is that they have a very high ratio of redundancy, which makes these documents demanding large storage capacity and high network band-width for transmission. These documents need to be decompressed and being used without or with minimum decompression. This paper presents the complete testing process for the XML compressing and Querying System (XCVQ) that has the ability to compress the XML documents and retrieve the required information according to all kinds of queries.
An XML document D often has a regular structure, i.e., it is composed of many similarly named and structured subtrees. Therefore, the entropy of a trees structuredness should be relatively low and thus the trees should be highly compressible by transforming them to an intermediate form. In general, this idea is used in permutation based XML-conscious compressors. An example of such a compressor is called XSAQCT, where the compressible form is called an annotated tree. While XSAQCT proved to be useful for various applications, it was never shown that it is a lossless compressor. This paper provides the formal background for the definition of an annotated tree, and a formal proof that the compression is lossless. It also shows properties of annotated trees that are useful for various applications, and discusses a measure of compressibility using this approach, followed by the experimental results showing compressibility of annotated trees.
Reflections on size, scale, scaleability, and value.
The advantages of the eXtensible Markup Language, XML, come at a cost, especially for huge datasets or when used on small mobile devices. Several known XML-conscious compressors used in real time environments compress data during data streaming. This paper presents a study of new real time algorithms that exploit local structural redundan- cies of pre-order traversals of an XML tree. These algorithms focus on reducing the overhead of streaming data while maintaining load balancing between the sender and receiver. Our algorithms have similar or better performance than existing algorithms, while emphasizing low memory and processing overheads.
Extensible Markup Language was designed to carry data which provides a platform to define own tags. XML documents are immense
in nature. As a result there has been an ever growing need for developing an efficient storage structure and high-performance
techniques to query efficiently. Though several storage structures are available, QUICX (Query and Update Support for Indexed
and Compressed XML) compact storage structure proved to be efficient in terms of data storage. The major reason for performance
loss in query processing is identified to be the storage structure used as well as the lack of efficient query processing
techniques. The approach (IQCX) focuses on indexing and querying the compressed data stored in QUICX, thereby increasing the
compression ratio. Proposed indexing technique exploits the high degree of redundancy exhibited by the XML documents. Thus
indexing enhances the query performance, compared to querying without indexing.
The advantages of using XML come at the cost, especially when used on networks and small mobile devices. This paper presents a design and implementation of four online XML compression algorithms, which exploit local structural redundancies of pre-order traversals of an XML tree, and focus on reducing the overhead of sending packets and maintaining load balancing between the sender and receiver. For testing, we designed a suite consisting of 11 XML files with various characteristics. Ten encoding techniques were compared, compressed respectively using GZIP, EXI, Treechop, XSAQCT and its improvement, and our algorithms. Experiments indicate that our new algorithms have similar or better performance than other online algorithms, and have only worse performance than EXI for files larger than 1 GB.