Conference Paper

Efficient Data Distribution for DWS.

DOI: 10.1007/978-3-540-85836-2_8 Conference: Data Warehousing and Knowledge Discovery, 10th International Conference, DaWaK 2008, Turin, Italy, September 2-5, 2008, Proceedings
Source: DBLP

ABSTRACT The DWS (Data Warehouse Striping) technique is a data partitioning approach especially designed for distributed data warehousing
environments. In DWS the fact tables are distributed by an arbitrary number of low-cost computers and the queries are executed
in parallel by all the computers, guarantying a nearly optimal speed up and scale up. Data loading in data warehouses is typically
a heavy process that gets even more complex when considering distributed environments. Data partitioning brings the need for
new loading algorithms that conciliate a balanced distribution of data among nodes with an efficient data allocation (vital
to achieve low and uniform response times and, consequently, high performance during the execution of queries). This paper
evaluates several alternative algorithms and proposes a generic approach for the evaluation of data distribution algorithms
in the context of DWS. The experimental results show that the effective loading of the nodes in a DWS system must consider
complementary effects, minimizing the number of distinct keys of any large dimension in the fact tables in each node, as well
as splitting correlated rows among the nodes.

0 Bookmarks
 · 
157 Views
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The new approach that we will propose, in this paper deals with the dynamic data distribution of the data warehouse (DWH) on a set of servers. This distribution is different from the “classical” one which depends on how data is used. It consists in distributing data when the machine reaches its storage limit capacity. The proposed approach insures the scalability and exploits the storage and processing resources available in the organization using the DWH. It is worth noting that our approach is based on a multi-agent model mixed with the scalability distribution proposed by the Scalable Distributed Data Structures. Our multi-agent model is made up of stationary agent classes: Client, Dispatcher, Domain and Server, and a mobile agent class: Messenger. These agents collaborate and achieve automatically the storage, splitting, redirection and access operations on the distributed DWH. In this paper, we focus on the global dynamic for the data access operation and we present the inherent experimental results.
    Distributed and Parallel Databases 01/2009; 25:29-45. · 0.81 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The Data Warehouse Striping (DWS) technique is a data partitioning approach especially designed for distributed data warehousing environments. In DWS the fact tables are distributed by an arbitrary number of low-cost computers and each query is executed in parallel by all the computers, guarantying a nearly optimal speed up and scale up. Data loading in distributed data warehouses is typically a heavy process and brings the need for loading algorithms that conciliate a balanced distribution of data among nodes with an efficient data allocation. These are fundamental aspects to achieve low and uniform response times and, consequently, high performance during the execution of queries. This paper proposes a generic approach for the evaluation of data distribution algorithms and assesses several alternative algorithms in the context of DWS. The experimental results show that the effective loading of the nodes must consider complementary effects, minimizing the number of distinct keys of any large dimension in the fact tables in each node, as well as splitting correlated rows among the nodes.
    International Journal of Database Management Systems. 10/2012; 4(5):119-135.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Multidimensional data generated by members on websites has seen massive growth in recent years. OLAP is a well-suited solution for mining and analyzing this data. Providing insights derived from this analysis has become crucial for these websites to give members greater value. For example, LinkedIn, the largest professional social network, provides its professional members rich analytics features like "Who's Viewed My Profile?" and "Who's Viewed This Job?" The data behind these features form cubes that must be efficiently served at scale, and can be neatly sharded to do so. To serve our growing 160 million member base, we built a scalable and fast OLAP serving system called Avatara to solve this many, small cubes problem. At LinkedIn, Avatara has been powering several analytics features on the site for the past two years.
    Proceedings of the VLDB Endowment. 08/2012; 5(12).

Full-text (2 Sources)

View
57 Downloads
Available from
May 22, 2014