A high performance integrated web data warehousing

Cluster Computing (Impact Factor: 1.51). 05/2007; 10(1):95-109. DOI: 10.1007/s10586-007-0008-9
Source: DBLP


Over the years, we have seen a significant number of integration techniques for data warehouses to support web integrated
data. However, the existing works focus extensively on the design concept. In this paper, we focus on the performance of a
web database application such as an integrated web data warehousing using a well-defined and uniform structure to deal with
web information sources including semi-structured data such as XML data, and documents such as HTML in a web data warehouse
system. By using a case study, our implementation of the prototype is a web manipulation concept for both incoming sources
and result outputs. Thus, the system not only can be operated through the web, it can also handle the integration of web data
sources and structured data sources. Our main contribution is the performance evaluation of an integrated web data warehouse
application which includes two tasks. Task one is to perform a verification of the correctness of integrated data based on
the result set that is retrieved from the web integrated data warehouse system using complex and OLAP queries. The result
set is checked against the result set that is retrieved from the existing independent data source systems. Task two is to
measure the performance of OLAP or complex query by investigating source operation functions used by these queries to retrieve
the data. The information of source operation functions used by each query is obtained using the TKPROF utility.

Full-text preview

Available from:
  • [Show abstract] [Hide abstract]
    ABSTRACT: A standards-based, open-source middleware system was designed and implemented to facilitate the analysis of large and disparate datasets. This system makes it possible to access several different types of database servers simultaneously, browse remote data, combine datasets, and join tables from remote databases independent of vendor. The system uses an algorithm known as Dynamic Merge Cache to handle data caching, query generation, transformations, and joining with minimal operational interference to source databases. The system is able to combine any subset of configured databases and convert the information into XML. The resulting XML is made available to analysis tools through a web service. After the system connects to a remote database, a metadata catalog is created from the source database. The user is able to configure which tables and fields to export from the remote dataset. The user is also able to filter, transform, and combine data. The system was tested with a large fish contaminant database and a second database populated with simulated scientific data.
    No preview · Conference Paper · Jan 2010
  • [Show abstract] [Hide abstract]
    ABSTRACT: On-Line Analytical Processing OLAP systems based on data warehouses are the main systems for managerial decision making and must have a quick response time. Several algorithms have been presented to select the proper set of data and elicit suitable structured environments to handle the queries submitted to OLAP systems, which are called view selection algorithms to materialize. As users' requirements may change during run time, materialization must be viewed dynamically. In this work, the authors propose and operate a dynamic view management system to select and materialize views with new and improved architecture, which predicts incoming queries through association rule mining and three probabilistic reasoning approaches: Conditional probability, Bayes' rule, and Naïve Bayes' rule. The proposed system is compared with DynaMat system and Hybrid system through two standard measures. Experimental results show that the proposed dynamic view selection system improves these measures. This system outperforms DynaMat and Hybrid for each type of query and each sequence of incoming queries.
    No preview · Article · Apr 2011 · International Journal of Data Warehousing and Mining
  • [Show abstract] [Hide abstract]
    ABSTRACT: This study aims to propose a model to convert Public Institution Financial Reporting System (PIFRS) into a XBRL-based online in accordance with adopted Korean International Financial Reporting Standards (K-IFRS). The financial reporting systems before the adaptation of PIFRS has found themselves with many input errors and time consumption in adding in the process of manual collection of Excel-based data and combining. However, the production of XBRL-based online PIFRS ensures the shortened period of preparation of closing statements and prompt response to the new establishment, change, integration, and discontinuance of public institutions, as well as the trustfulness of public financial information and the reduction of time in collecting related financial data. In addition, XBRL-based online PIFRS made it possible to reflect the change of account category by the revision of IFRS and respond to the frequent changes in account such as emission trading system, without the structural modification of the whole system but only with alignment of the Taxonomy. Therefore, it helped the system users consistently answer to the demand of the public announcement of various types of financial information and innovatively shortened the periods of financial reporting, submission and approval.
    No preview · Article · Sep 2013 · Cluster Computing
Show more