Conference Paper

Parallelization of Permuting XML Compressors

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

The verbose nature of XML results in overheads in storage and network transfers, which may be overcome by using parallel computing. This paper presents four permuting parallel XML compressors, based on an existing XML compressor, called XSAQCT. Tests were performed on multi-core machines using a test suite incorporating XML documents with various characteristics, and results were analyzed to find upper bounds given by Amdahl’s law, the actual speedup, and compression ratios.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Additionally, in the column-store environment, the contents of a single entity is often stored in many locations, which then requires additional logic to combine the attributes for joining and grouping attributes; (this is exactly how many permuting XML compressors work). Also, in [Fry, 2011] and [Corbin et al., 2013] XSAQCT was compared to many different XML-database engines using the BerkelyDB Key-value database, and the results were promising. Figure 2 depicts the architectural layout of a modified XSAQCT. ...
Conference Paper
Full-text available
With the renewed industrial and academic interest in Column-Oriented Database Management Systems, a lot of interest has been shown in the area of software optimizations designed to improve the efficiency of queries in the Column-Oriented domain. Meanwhile, XML database management systems are often considered in the context of mapping its hierarchal structure to the relational domain to allow efficient querying upon the XML structure. However, by examining the relationship between specific XML Compression techniques and Column-Oriented Databases Systems, we can reduce the vast overheads in organizing XML in relational entities. This paper examines said relationships and presents a lightweight XML Database System derived from XML Compression, based on Column-Oriented Database architectures.
Conference Paper
Full-text available
Because of a growing interest in using XML for massive complex data there has been considerable research on designing XML compressors. This paper presents our research aimed at building parallel XML compressors, using Java and OpenMP (with C++). Our findings show that OpenMP is a preferred choice achieving better results than Java using a multi-core platform.
Article
Full-text available
The mission of the Universal Protein Resource (UniProt) (http://www.uniprot.org) is to support biological research by providing a freely accessible, stable, comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase. It integrates, interprets and standardizes data from numerous resources to achieve the most comprehensive catalogue of protein sequences and functional annotation. UniProt comprises four major components, each optimized for different uses, the UniProt Archive, the UniProt Knowledgebase, the UniProt Reference Clusters and the UniProt Metagenomic and Environmental Sequence Database. UniProt is produced by the UniProt Consortium, which consists of groups from the European Bioinformatics Institute (EBI), the SIB Swiss Institute of Bioinformatics (SIB) and the Protein Information Resource (PIR). UniProt is updated and distributed every 4 weeks and can be accessed online for searches or downloads.
Article
Full-text available
Recently, there has been a growing interest in queryable XML compressors, which can be used to query compressed data with minimal decompression, or even without any decompression. At the same time, there are very few such projects, which have been made available for testing and comparisons. In this paper, we report our current work on two novel queryable XML compressors; a schema-based compressor, SXSAQCT, and a schema-free compressor, XSAQCT. While the work on both compressors is in its early stage, our experiments (reported here) show that our approach may be successfully competing with other known queryable compressors. @InProceedings{mldner_et_al:DSP:2008:1673, author = {Tomasz M{"u}ldner and Christopher Fry and Jan Krysztof Miziolek and Scott Durno}, title = {SXSAQCT and XSAQCT: XML Queryable Compressors}, booktitle = {Structure-Based Compression of Complex Massive Data }, year = {2008}, editor = {Stefan B{"o}ttcher and Markus Lohrey and Sebastian Maneth and Wojcieh Rytter}, number = {08261}, series = {Dagstuhl Seminar Proceedings}, ISSN = {1862-4405}, publisher = {Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, Germany}, address = {Dagstuhl, Germany}, URL = {http://drops.dagstuhl.de/opus/volltexte/2008/1673}, annote = {Keywords: XML compression, queryable} }
Article
Full-text available
We describe a tool for compressing XML data, with applications in data exchange and archiving, which usually achieves about twice the compression ratio of gzip at roughly the same speed. The compressor, called XMill, incorporates and combines existing compressors in order to apply them to heterogeneous XML data: it uses zlib, the library function for gzip, a collection of datatype specific compressors for simple data types, and, possibly, user defined compressors for application specific data types. 1 Introduction We have implemented a compressor/decompressor for XML data, to be used in data exchange and archiving, that achieves about twice the compression rate of general-purpose compressors (gzip), at about the same speed. The tool can be downloaded from www.research.att.com/sw/tools/xmill/. XML is now being adopted by many organizations and industry groups, like the healthcare, banking, chemical, and telecommunications industries. The attraction in XML is that it is a self-describi...
Chapter
Because of a growing interest in using XML for massive complex data there has been considerable research on designing XML compressors. This paper presents our research aimed at building parallel XML compressors, using Java and OpenMP (with C++). Our findings show that OpenMP is a preferred choice achieving better results than Java using a multi-core platform.
Extensible markup language (XML) 1.0, 5th edn Accessed
  • Xml