IBP: An Index-Based XML Parser Model.
ABSTRACT With XML widely used in distributed system, the existing parser models, DOM and SAX, are inefficient and resource intensive
for applications with large XML documents. This paper presents an index-based parser model (IBP), which contains validation
and non-validation modes, supports nearly all the XML characteristics. IBP has the characters of speediness, robustness and
low resource requirement, which is more suitable for mass information parsing. We presents the application of IBP in a real-time
distributed monitoring prototype system, the results have shown IBP effective.
- SourceAvailable from: psu.edu
Conference Proceeding: XML parsing: a threat to database performance.[show abstract] [hide abstract]
ABSTRACT: XML parsing is generally known to have poor performance characteristics relative to transactional database processing. Yet, its potentially fatal impact on overall database performance is being underestimated. We report real-word database applications where XML parsing performance is a key obstacle to a successful XML deployment. There is a considerable share of XML database applications which are prone to fail at an early and simple road block: XML parsing. We analyze XML parsing performance and quantify the extra overhead of DTD and schema validation. Comparison with relational database performance shows that the desired response times and transaction rates over XML data can not be achieved without major improvements in XML parsing technology. Thus, we identify research topics which are most promising for XML parser performance in database systems.Proceedings of the 2003 ACM CIKM International Conference on Information and Knowledge Management, New Orleans, Louisiana, USA, November 2-8, 2003; 01/2003
- [show abstract] [hide abstract]
ABSTRACT: We describe a tool for compressing XML data, with applications in data exchange and archiving, which usually achieves about twice the compression ratio of gzip at roughly the same speed. The compressor, called XMill, incorporates and combines existing compressors in order to apply them to heterogeneous XML data: it uses zlib, the library function for gzip, a collection of datatype specific compressors for simple data types, and, possibly, user defined compressors for application specific data types. 1 Introduction We have implemented a compressor/decompressor for XML data, to be used in data exchange and archiving, that achieves about twice the compression rate of general-purpose compressors (gzip), at about the same speed. The tool can be downloaded from www.research.att.com/sw/tools/xmill/. XML is now being adopted by many organizations and industry groups, like the healthcare, banking, chemical, and telecommunications industries. The attraction in XML is that it is a self-describi...11/2000;
Conference Proceeding: Compressing XML with multiplexed hierarchical PPM models[show abstract] [hide abstract]
ABSTRACT: We established a working Extensible Markup Language (XML) compression benchmark based on text compression, and found that bzip2 compresses XML best, albeit more slowly than gzip. Our experiments verified that T<sub>XMILL</sub> speeds up and improves compression using gzip and bounded-context PPM by up to 15%, but found that it worsens the compression for bzip2 and PPM. We describe alternative approaches to XML compression that illustrate other tradeoffs between speed and effectiveness. We describe experiments using several text compressors and XMILL to compress a variety of XML documents. Using these as a benchmark, we describe our two main results: an online binary encoding for XML called Encoded SAX (ESAX) that compresses better and faster than existing methods; and an online, adaptive, XML-conscious encoding based on prediction by partial match (PPM) called multiplexed hierarchical modeling (MHM) that compresses up to 35 % better than any existing method but is fairly slowData Compression Conference, 2001. Proceedings. DCC 2001.; 02/2001