Article

A high performance integrated web data warehousing

Cluster Computing (Impact Factor: 0.78). 01/2007; 10(1):95-109. DOI: 10.1007/s10586-007-0008-9
Source: DBLP

ABSTRACT Over the years, we have seen a significant number of integration techniques for data warehouses to support web integrated
data. However, the existing works focus extensively on the design concept. In this paper, we focus on the performance of a
web database application such as an integrated web data warehousing using a well-defined and uniform structure to deal with
web information sources including semi-structured data such as XML data, and documents such as HTML in a web data warehouse
system. By using a case study, our implementation of the prototype is a web manipulation concept for both incoming sources
and result outputs. Thus, the system not only can be operated through the web, it can also handle the integration of web data
sources and structured data sources. Our main contribution is the performance evaluation of an integrated web data warehouse
application which includes two tasks. Task one is to perform a verification of the correctness of integrated data based on
the result set that is retrieved from the web integrated data warehouse system using complex and OLAP queries. The result
set is checked against the result set that is retrieved from the existing independent data source systems. Task two is to
measure the performance of OLAP or complex query by investigating source operation functions used by these queries to retrieve
the data. The information of source operation functions used by each query is obtained using the TKPROF utility.

1 Bookmark
 · 
113 Views
  • [Show abstract] [Hide abstract]
    ABSTRACT: On-Line Analytical Processing OLAP systems based on data warehouses are the main systems for managerial decision making and must have a quick response time. Several algorithms have been presented to select the proper set of data and elicit suitable structured environments to handle the queries submitted to OLAP systems, which are called view selection algorithms to materialize. As users' requirements may change during run time, materialization must be viewed dynamically. In this work, the authors propose and operate a dynamic view management system to select and materialize views with new and improved architecture, which predicts incoming queries through association rule mining and three probabilistic reasoning approaches: Conditional probability, Bayes' rule, and Naïve Bayes' rule. The proposed system is compared with DynaMat system and Hybrid system through two standard measures. Experimental results show that the proposed dynamic view selection system improves these measures. This system outperforms DynaMat and Hybrid for each type of query and each sequence of incoming queries.
    IJDWM. 01/2011; 7:67-96.
  • [Show abstract] [Hide abstract]
    ABSTRACT: A standards-based, open-source middleware system was designed and implemented to facilitate the analysis of large and disparate datasets. This system makes it possible to access several different types of database servers simultaneously, browse remote data, combine datasets, and join tables from remote databases independent of vendor. The system uses an algorithm known as Dynamic Merge Cache to handle data caching, query generation, transformations, and joining with minimal operational interference to source databases. The system is able to combine any subset of configured databases and convert the information into XML. The resulting XML is made available to analysis tools through a web service. After the system connects to a remote database, a metadata catalog is created from the source database. The user is able to configure which tables and fields to export from the remote dataset. The user is also able to filter, transform, and combine data. The system was tested with a large fish contaminant database and a second database populated with simulated scientific data.
    Proceedings of the Eighth ACIS International Conference on Software Engineering Research, Management and Applications, SERA 2010, Montreal, Canada, May 24-26, 2010; 01/2010