Conference Paper

Efficient Data Distribution for DWS.

DOI: 10.1007/978-3-540-85836-2_8 Conference: Data Warehousing and Knowledge Discovery, 10th International Conference, DaWaK 2008, Turin, Italy, September 2-5, 2008, Proceedings
Source: DBLP

ABSTRACT The DWS (Data Warehouse Striping) technique is a data partitioning approach especially designed for distributed data warehousing
environments. In DWS the fact tables are distributed by an arbitrary number of low-cost computers and the queries are executed
in parallel by all the computers, guarantying a nearly optimal speed up and scale up. Data loading in data warehouses is typically
a heavy process that gets even more complex when considering distributed environments. Data partitioning brings the need for
new loading algorithms that conciliate a balanced distribution of data among nodes with an efficient data allocation (vital
to achieve low and uniform response times and, consequently, high performance during the execution of queries). This paper
evaluates several alternative algorithms and proposes a generic approach for the evaluation of data distribution algorithms
in the context of DWS. The experimental results show that the effective loading of the nodes in a DWS system must consider
complementary effects, minimizing the number of distinct keys of any large dimension in the fact tables in each node, as well
as splitting correlated rows among the nodes.

0 Bookmarks
 · 
164 Views
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The Data Warehouse Striping (DWS) technique is a data partitioning approach especially designed for distributed data warehousing environments. In DWS the fact tables are distributed by an arbitrary number of low-cost computers and each query is executed in parallel by all the computers, guarantying a nearly optimal speed up and scale up. Data loading in distributed data warehouses is typically a heavy process and brings the need for loading algorithms that conciliate a balanced distribution of data among nodes with an efficient data allocation. These are fundamental aspects to achieve low and uniform response times and, consequently, high performance during the execution of queries. This paper proposes a generic approach for the evaluation of data distribution algorithms and assesses several alternative algorithms in the context of DWS. The experimental results show that the effective loading of the nodes must consider complementary effects, minimizing the number of distinct keys of any large dimension in the fact tables in each node, as well as splitting correlated rows among the nodes.
    International Journal of Database Management Systems. 10/2012; 4(5):119-135.
  • [Show abstract] [Hide abstract]
    ABSTRACT: The distributed data warehousing is mainly based on how the data is used in the dynamic data distribution on a set of servers. In the existing approach is such a large volume of database, finding the relevant information itself a very difficult task. The query redirection process is doing this process in distributed data warehousing. But there will be a problem in increasing the network loads and execution time. And also there is a problem in securing the data circulation on the network. The data management also needs the collaboration and interaction between the machines in order to reply the user queries. To overcome the above problem our idea is to refine this existing approach by focusing on query redirection process using multi agent systems. And also it facilitates the collaboration, interaction and independency of the different machines and improves the parallel execution of the user queries. In our approach, mainly we focus on the query redirection process operation in distributed data warehousing using multi agent systems.
    01/2010;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Multidimensional data generated by members on websites has seen massive growth in recent years. OLAP is a well-suited solution for mining and analyzing this data. Providing insights derived from this analysis has become crucial for these websites to give members greater value. For example, LinkedIn, the largest professional social network, provides its professional members rich analytics features like "Who's Viewed My Profile?" and "Who's Viewed This Job?" The data behind these features form cubes that must be efficiently served at scale, and can be neatly sharded to do so. To serve our growing 160 million member base, we built a scalable and fast OLAP serving system called Avatara to solve this many, small cubes problem. At LinkedIn, Avatara has been powering several analytics features on the site for the past two years.
    Proceedings of the VLDB Endowment. 08/2012; 5(12).

Full-text (2 Sources)

Download
62 Downloads
Available from
May 22, 2014